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on the identification and characterization of those chromosomal sequences which encode a protein product are 
particularly relevant to diagnostic and therapeutic uses. In some instances, the sequences used in such therapeutic 
or diagnostic techniques may be sequences which encode proteins which are secreted from the cell in which they are 
synthesized, as well as the secreted proteins themselves, are particularly valuable as potential therapeutic agents 
Such proteins are often involved in cell to cell communication and may be responsible for producing a clinically 
relevant response in their target cells. In fact, several secretory proteins, including tissue plasminogen activator 
G-CSF, GM-CSF, erythropoietin, human growth hormone, insulin, interferons, interferon-p, interferon-y and 
interleukin-2, are currently in clinical use. These proteins are used to treat a wide range of conditions including 
acute myocardial infarction, acute ischemic stroke, anemia, diabetes, growth hormone deficiency hepatitis kidney 
carcinoma, chemotherapy-induced neutropenia and multiple sclerosis. For these reasons, extended cDNAs encoding 
secreted proteins or portions thereof represent a valuable source of therapeutic agents. Thus, there is a need for 
the identification and characterization of secreted proteins and the nucleic acids encoding them. 
[0008] In addition to being therapeutically useful themselves, secretory proteins include short peptides, called 
signal peptides, at their amino termini which direct their secretion. These signal peptides are encoded by the 
signal sequences located at the 5' ends of the coding sequences of genes encoding secreted proteins. These signal 
peptides can be used to direct the extracellular secretion of any protein to which they are operably linked. In 
addition, portions of the signal peptides called membrane-translocating sequences, may also be used to direct the 
intracellular import of a peptide or protein of interest This may prove beneficial in gene therapy strategies in 
which it is desired to deliver a particular gene product to cells other than the cell in which it is produced 
Signal sequences encoding signal peptides also find application in simplifying protein purification techniques. In 
such applications, the extracellular secretion of tb*» riAsir*aH nrntein nroatiw fooiiito+oo .^uritt^^ u,, — 
the number of undesired proteins from which the desired protein must be selected. Thus, there exists a need to 
identify and characterize the 5* portions of the genes for secretory proteins which encode signal peptides. 
[0009] Sequences coding for non-secreted proteins may also find application as therapeutics or diagnostics. In 
particular, such sequences may be used to determine whether an individual is likely to express a detectable 
phenotype, such as a disease, as a consequence of a mutation in the coding sequence for a non-secreted protein or 
for a secreted protein. In instances where the individual is at risk of suffering from a disease or other 
undesirable phenotype as a result of a mutation in such a coding sequence, the undesirable phenotype may be 
corrected by introducing a normal coding sequence using gene therapy. Alternatively, if the undesirable phenotype 
results from overexpression of the protein encoded by the coding sequence, expression of the protein may be reduced 
using antisense or triple helix based strategies. 

[0010] The secreted or non-secreted human polypeptides encoded by the coding sequences may also be used as 
therapeutics by administering them directly to an individual having a condition, such as a disease, resulting from a 
mutation in the sequence encoding the polypeptide. In such an instance, the condition can be cured 'or ameliorated by 
administering the polypeptide to the individual. 

[0011] In addition, the secreted or non-secreted human polypeptides or portions thereof may be used to generate 
antibodies useful in determining the tissue type or species of origin of a biological sample. The antibodies may 
also be used to determine the cellular localization of the secreted or non-secreted human polypeptides or the 
cellular localization of polypeptides which have been fused to the human polypeptides. In addition, the antibodies 
may also be used in immunoaffinity chromatography techniques to isolate, purify, or enrich the human polypeptide or 
a target polypeptide which has been fused to the human polypeptide. 

[0012] Public information on the number of human genes for which the promoters and upstream regulatory regions 
have been identified and characterized is quite limited. In part, this may be due to the difficulty of isolating 
such regulatory sequences. Upstream regulatory sequences such as transcription factor binding sites are typically 
too short to be utilized as probes for isolating promoters from human genomic libraries. Recently, some approaches 
have been developed to isolate human promoters. One of them consists of making a CpG island library (Cross et al. 
, Nature Genetics 6: 236-244, 1994). The second consists of isolating human genomic DNA sequences containing Spei 
binding sites by the use of Spel binding protein. (Mortlock et al., Genome Res. 6:327-335, 1996), Both of these 
approaches have their limits due to a lack of specificity or because they are not universally applicable since only 
a limited number of promoters have either a CpG island or a Spe I recognition site and because Spe I binding sites 
are not specifically found in promoter regions. Thus, there exists a need to identify and systematically 
characterize the 5' portions of the genes. 

[0013] The present 5' ESTs may be used to efficiently identify and isolate 5'UTRs and upstream regulatory 
regions which control the location, developmental stage, rate, and quantity of protein synthesis, as well as the 
stability of the mRNA. Once identified and characterized, these regulatory regions may be utilized in gene therapy 
or protein purification schemes to obtain the desired" amount and locations of protein synthesis or to inhibit 
reduce, or prevent the synthesis of undesirable gene products. 

[0014] In addition, ESTs containing the 5' ends of protein genes may include sequences useful as probes for 
chromosome mapping and the identification of individuals. Thus, there is a need to identify and characterize the 
sequences upstream of the 5* coding sequences of genes. 

Summary of the invention 

[0015] The present invention relates to purified, isolated, or enriched 5' ESTs which include sequences derived 
from the authentic 5' ends of their corresponding mRNAs. The term "corresponding mRNA" refers to the mRNA which 
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was the template for the cDNA synthesis which produced the 5' EST. These sequences will be referred to hereinafter 
as "5* ESTs" The present invention also includes purified, isolated or enriched nucleic acids comprising contigs 
assembled by determining a consensus sequences from a plurality of ESTs containing overlapping sequences. These 
contigs will be referred to herein as "consensus contigated ESTs." 

[0016] As used herein, the term "purified" does not require absolute purity; rather, it is intended as a 
relative definition. Individual 5' EST clones isolated from a cDNA library have been conventionally purified to 
electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either from the 
library or from total human DNA. The cDNA clones are not naturally occurring as such, but rather are obtained via 
manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a 
cDNA library invoives the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated 
from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA and subsequently 
isolating individual clones from that library results in an approximately 10 4 a-10°a fold purification of the native 
message. Purification of starting material or natural material to at least one order of magnitude, preferably two or 
three orders, and more preferably four or five orders of magnitude is expressly contemplated. 

[0017] As used herein, the term "isolated* requires that the material be removed from its original environment 
(e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide 
present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting 
materials in the natural system, is isolated. 

[0018] As used herein, the term "enriched" means that the 5' EST is adjacent to "backbone" nucleic acid to which 
it is not adjacent in its natural environment. Additionally, to be "enriched" the 5' ESTs will represent 5% or more 
of the number of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules 
according to the present invention include nucleic acids such as expression vectors, self-replicating nucleic acids, 
viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate a nucleic acid 
insert of interest. Preferably, the enriched 5' ESTs represent 15% or more of the number of nucleic acid inserts in 
the population of recombinant backbone molecules. More preferably, the enriched 5' ESTs represent 50% or more of 
the number of nucleic acid inserts in the population of recombinant backbone molecules. In a highly preferred 
embodiment, the enriched 5' ESTs represent 90% or more of the number of nucleic acid inserts in the population of 
recombinant backbone molecules. 

[0019] "Stringent", "moderate," and "low" hybridization conditions are as defined below. 

[0020] The term "polypeptide" refers to a polymer of amino acids without regard to the length of the polymer; 
thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does 
not specify or exclude post-expression modifications of polypeptides, for example, polypeptides which include the 
covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly 
encompassed by the term polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids which only occur 
naturally in an unrelated biological system, modified amino acids from mammalian systems etc.), polypeptides with 
substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally 
occurring. 

[0021] As used interchangeably herein, the terms "nucleic acids", "oligonucleotides", and "polynucleotides" 
include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or duplex form. 
The term "nucleotide" as used herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid 
sequences of any length in single-stranded or duplex form. The term "nucleotide" is also used herein as a noun to 
refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger 
nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate 
group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. Although 
the term "nucleotide" is also used herein to encompass "modified nucleotides" which comprise at least one 
modifications (a) an alternative linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, 
or (d) an analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for example 
PCT publication No. WO 95/04064. The polynucleotide sequences of the invention may be prepared by any known 
method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any 
purification methods known in the art. 

[0022] The terms "base paired" and "Watson & Crick base paired" are used interchangeably herein to refer to 
nucleotides which can be hydrogen bonded to one another be virtue of their sequence identities in a manner like that 
found in double-helical DNA with thymine or uracil residues linked to adenine residues by two hydrogen bonds and 
cytosine and guanine residues linked by three hydrogen bonds (See Stryer, L, Biochemistry, 4*8 edition, 1995). 
[0023] The terms "complementary 4 ' or "complement thereof are used herein to refer to the sequences of 
polynucleotides which is capable of forming Watson & Crick base pairing with another specified polynucleotide 
throughout the entirety of the complementary region. For the purpose of the present invention, a first 
polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide 
is paired with its complementary base. Complementary bases are, generally, A and T (or A and U) p or C and G. 
"Complement" is used herein as a synonym from "complementary polynucleotide", "complementary nucleic acid" and 
"complementary nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would actually bind. 
Preferably, a "complementary" sequence is a sequence which an A at each position where there is a T on the opposite 
strand, a T at each position where there is an A on the opposite strand, a G at each position where there is a C on 
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the opposite strand and a C at each position where there is a G on the opposite strand. 

[0024] Thus, 5' ESTs in cDNA libraries in which one or more 5' ESTs make up 5% or more of the number of nucleic 
acid inserts in the backbone molecules are "enriched recombinant 5* ESTs" as defined herein. Likewise, 5' ESTs in a 
population of plasmids in which one or more 5' ESTs of the present invention have been inserted such that they 
represent 5% or more of the number of inserts in the plasmid backbone are "enriched recombinant 5' ESTs" as defined 
herein. However, 5' ESTs in cDNA libraries in which 5' ESTs constitute less than 5% of the number of nucleic acid 
inserts in the population of backbone molecules, such as libraries in which backbone molecules having a 5' EST 
insert are extremely rare, are not "enriched recombinant 5' ESTs." 

[0025J In some embodiments, the present invention relates to 5' ESTs which are derived from genes encoding 
secreted proteins. As used herein, a "secreted" protein is one which, when expressed in a suitable' host cell, is 
transported across or through a membrane, including transport as a result of signal peptides in its amino acid 
sequence. "Secreted" proteins include without limitation proteins secreted wholly (e.g. soluble proteins), or 
partially (e.g. receptors) from the cell in which they are expressed. "Secreted" proteins also include without 
limitation proteins which are transported across the membrane of the endoplasmic reticulum. 

[0026] Such 5* ESTs include nucleic acid sequences, called signal sequences, which encode signal peptides which 
direct the extracellular secretion of the proteins encoded by the genes from which the 5' ESTs are derived. 
Generally, the signal peptides are located at the amino termini of secreted proteins. 

[0027] Secreted proteins are translated by ribosomes associated with the "rough" endoplasmic reticulum. 
Generally, secreted proteins are co-translationally transferred to the membrane of the endoplasmic reticulum. 
Association of the ribosome with the endoplasmic reticulum during translation of secreted proteins is mediated by 
the signai peptide. The signal peptide is typically cleaved following its co-translational entry into the 
endoplasmic reticulum. After delivery to the endoplasmic reticulum, secreted proteins may proceed through the Golgi 
apparatus. In the Golgi apparatus, the proteins may undergo post-translational modification before entering 
secretory vesicles which transport them across the cell membrane. 

[0028] The 5' ESTs of the present invention have several important applications. For example, they may be used 
to obtain and express cDNA clones which include the full protein coding sequences of the corresponding gene 
products, including the authentic translation start sites derived from the 5' ends of the coding sequences of the 
mRNAs from which the 5' ESTs are derived. These cDNAs will be referred to hereinafter as "full-length cDNAs." These 
cDNAs may also include DNA derived from mRNA sequences upstream of the translation start site. The full-length 
cDNA sequences may be used to express the proteins corresponding to the 5' ESTs. As discussed above, secreted 
proteins and non-secreted proteins may be therapeutically important Thus, the proteins expressed from the cDNAs 
may be useful in treating or controlling a variety of human conditions. The 5' ESTs may also be used to obtain the 
corresponding genomic DNA. The term "corresponding genomic DNA" refers to the genomic DNA which encodes the 
mRNA from which the 5' EST was derived. 

[0029] Alternatively, the 5' ESTs may be used to obtain and express extended cDNAs encoding portions of the 
protein. In the case of secreted proteins, the portions may comprise the signal peptides of the secreted proteins or 
the mature proteins generated when the signal peptide is cleaved off. 

[0030] The present invention includes isolated, purified, or enriched "EST-related nucleic acids." The terms 
"isolated", "purified" or "enriched" have the meanings provided above. As used herein, the term "EST-related nucleic 
acids" means the nucleic acids of SEQ ID NOs: 24-4100 and 8178-36681, extended cDNAs obtainable using the 
nucleic acids of SEQ ID NOs: 24-4100 and 8178-36681, full-length cDNAs obtainable using the nucleic acids of SEQ ID 
NOs: 24-4100 and 8178-36681 or genomic DNAs obtainable using the nucleic acids of SEQ ID NOs: 24-4100 and 
8178-36681. The present invention also includes the sequences complementary to the EST-related nucleic acids. 
[0031] The present invention also includes isolated, purified, or enriched "fragments of EST-related nucleic 
acids." The terms "isolated", "purified" and "enriched" have the meanings described above. As used herein the term 
"fragments of EST-related nucleic acids" means fragments comprising at least 10, 12, 15, 18 20 23 25 28 30 35 
40, 50, 75, 100, 200, 300, 500, or 1000 consecutive nucleotides of the EST-related nucleic" acids to the extent that 
fragments of these lengths are consistent with the lengths of the particular EST-related nucleic acids being 
referred to. The present invention also includes the sequences complementary to the fragments of the EST-related 
nucleic acids. 

[0032] The present invention also includes isolated, purified, or enriched "positional segments of EST-related 
nucleic acids." The terms "isolated", "purified", or "enriched" have the meanings provided above. As used herein, 
the term "positional segments of EST-related nucleic acids" includes segments comprising nucleotides 1-25 26-50 51- 
75, 76-100, 101-125, 126-150, 151-175, 176-200, 201-225, 226-250, 251-300, 301-325, 326-350 351-375 376-400 
401-425, 426-450, 451-475, 476-500, 501-525, 526-550, 551-575, 576-600 and 601 -the terminal nucleotide of the EST- 
related nucleic acids to the extent that such nucleotide positions are consistent with the lengths of the particular 
EST-related nucleic acids being referred to. The term "positional segments of EST-related nucleic acids also 
includes segments comprising nucleotides 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350 351-400 401- 
450, 450-500, 501-550, 551-600 or 601-the terminal nucleotide of the EST-related nucleic' acids to the extent that 
such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids being referred 
to. The term "positional segments of EST-related nucleic acids" also includes segments comprising nucleotides 1-100 
101-200, 201-300, 301-400, 501-500, 500-600, or 601-the terminal nucleotide of the EST-related nucleic acids to the 
extent that such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids 
being referred to. In addition, the term "positional segments of EST-related nucleic acids" includes segments 
comprising nucleotides 1-200, 201-400, 400-600, or 601-the terminal nucleotide of the EST-related nucleic acids to 
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the extent that such nucleotide positions are consistent with the lengths of the particular EST related nucleic 
acids being referred to. The present invention also includes the sequences complementary to the positional segments 
of EST-related nucleic acids. 

[0033] The present invention also includes isolated, purified, or enriched "fragments of positional segments of 
EST-related nucleic acids." The terms "isolated", "purified", or "enriched" have the meanings provided above. As 
used herein, the term "fragments of positional segments of EST-related nucleic acids" refers to fragments comprising 
at least 10, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 150, or 200 consecutive nucleotides of the positional 
segments of EST-related nucleic acids. The present invention also includes the sequences complementary to the 
fragments of positional segments of EST-related nucleic acids . 

[0034] The present invention also includes isolated or purified "EST-related polypeptides." The terms "isolated" 
10 or "purified" have the meanings provided above. As used herein, the term "EST-related polypeptides" means the 
polypeptides encoded by the EST-related nucleic acids, including the polypeptides of SEQ ID NOs: 4101-8177. 
[0035] The present invention also includes isolated or purified "fragments of EST-related polypeptides." The 
terms "isolated" or "purified" have the meanings provided above. As used herein, the term "fragments of EST-related 
polypeptides" means fragments comprising at least 5, 10, 15, 20, 25, 30, 35,* 40, 50, 75, 100, or 150 consecutive 
amino acids of an EST-related polypeptide to the extent that fragments of these lengths are consistent with the 

'5 lengths of the particular EST-related polypeptides being referred to. 

[0036] The present invention also includes isolated or purified "positional segments of EST-related 

polypeptides." As used herein, the term "positional segments of EST-related polypeptides" includes polypeptides 
comprising amino acid residues 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151-175, 176-200, or 201 -the C-terminal 
amino acid of the EST-related polypeptides to the extent that such amino acid residues are consistent with the 

20 lengths of the particular EST-related polypeptides being referred to. The term "positional segments of EST-related 
polypeptides also includes segments comprising amino acid residues 1-50, 51-100, 101-150, 151-200 or 201-the C- 
terminal amino acid of the EST-related polypeptides to the extent that such amino acid residues are consistent with 
the lengths of the particular EST-related polypeptides oeing referred to. The term "positional segments of EST- 
related polypeptides" also includes segments comprising amino acids 1-100 or 101-200 of the EST-related polypeptides 
to the extent that such amino acid residues are consistent with the lengths of particular EST-related polypeptides 

25 being referred to. In addition, the term "positional segments of EST-related polypeptides" includes segments 
comprising amino acid residues 1-200 or 201-the C-terminal amino acid of the EST-related polypeptides to the extent 
that amino acid residues are consistent with the lengths of the particular EST related polypeptides being referred to. 
[0037] The present invention also includes isolated or purified "fragments of positional segments of EST-related 
polypeptides." The terms "isolated" or "purified" have the meanings provided above. As used herein, the term 
"fragments of positional segments of EST-related polypeptides" means fragments comprising at least 5, 10, 15, 20, 25, 

30 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of positional segments of EST-related polypeptides to the 
extent that fragments of these lengths are consistent with the lengths of the particular EST-related polypeptides 
being referred to. 

[0038] The present invention also includes antibodies which specifically recognize the EST-related polypeptides, 
fragments of EST-related polypeptides, positional segments of EST-related polypeptides, or fragments of positional 
35 segments of EST-related polypeptides. In the case of secreted proteins, such as those of SEQ ID NOs; 7798-7888 
antibodies which specifically recognize the mature protein generated when the signal peptide is cleaved may also be 
obtained as described below. Similarly, antibodies which specifically recognize the signal peptides of SEQ ID NOs: 
4101-4729 or 7798-7888 may also be obtained. 

[0039] In some embodiments and in the case of secreted proteins, the EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of 

40 nucleic acids include a signal sequence. In other embodiments, the EST-related nucleic acids, fragments of EST- 
related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of 
nucleic acids may include the full coding sequence for the protein or, in the case of secreted proteins, the full 
coding sequence of the mature protein (i.e. the protein generated when the signal polypeptide is cleaved off). In 
addition, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related 
nucleic acids, or fragments of positional segments of nucleic acids may include regulatory regions upstream of the 

45 translation start site or downstream of the stop codon which control the amount, location, or developmental stage of 
gene expression. 

[0040] As discussed above, both secreted and non-secreted human proteins may be therapeutically important. 
Thus, the proteins expressed from the EST-related nucleic acids, fragments of EST-related nucleic acids, positional 
segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may be useful in 

5Q treating or controlling a variety of human conditions. 

[0041] The EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related 
nucleic acids, or fragments of positional segments of nucleic acids may be used in forensic procedures to identify 
individuals or in diagnostic procedures to identify individuals having genetic diseases resulting from abnormal gene 
expression. In addition, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional segments of nucleic acids are useful for constructing a 

55 high resolution map of the human chromosomes. 

[0042] The present invention also relates to secretion vectors capable of directing the secretion of a protein 
of interest. Such vectors may be used in gene therapy strategies in which it is desired to produce a gene product in 
one cell which is to be delivered to another location in the body. Secretion vectors may also facilitate the 
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purification of desired proteins. 

[0043] The present invention also relates to expression vectors capable of directing the expression of an 
inserted gene in a desired spatial or temporal manner or at a desired level. Such vectors may include sequences 
upstream of the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional segments of nucleic acids, such as promoters or upstream 
5 regulatory sequences. 

[0044] The present invention also comprises fusion vectors for making chimeric polypeptides comprising a first 
polypeptide and a second polypeptide. Such vectors are useful for determining the cellular localization of the 
chimeric polypeptides or for isolating, purifying or enriching the chimeric polypeptides. 

[0045] The EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related 
10 nucleic acids, or fragments of positional segments of nucleic acids may also be used for gene therapy to control or 
treat genetic diseases. In the case of secreted proteins, signal peptides may be fused to heterologous proteins to 
direct their extracellular secretion. 

[0046] Bacterial clones containing Bluescipt plasmids having inserts containing the sequence of the non- 
clustered 5'ESTs are presently stored at 80°C in 4% (v/v) glycerol in the inventor's laboratories under the 
designations. The non-clustered 5'ESTs are those which comprise a single EST from a single tissue in the listing of 

15 Table II. The inserts may be recovered from the stored materials by growing the appropriate clones on a suitable 
medium. The Bluescript DNA can then be isolated using plasmid isolation procedures familiar to those skilled in the 
art such as alkaline lysis minipreps or large scale alkaline lysis plasmid isolation procedures. If desired the 
plasmid DNA may be further enriched by centrifugation on a cesium chloride gradient, size exclusion chromatography, 
or anion exchange chromatography. The plasmid DNA obtained using these procedures may then be manipulated 

2Q using standard cloning techniques familiar to those skilled in the art Alternatively, a PCR can be done with 
primers designed at both ends of the inserted EST-related nucleic acids, fragments of EST-retated nucleic acids, 
positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids. The PCR 
product which corresponds to the EST-related nucleic acids, fragments of EST-related nucleic acids, positional 
segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids can then be manipulated 
using standard cloning techniques familiar to those skilled in the art. 

25 [0047] One embodiment of the present invention is a purified nucleic acid comprising a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

[0048] Another embodiment of the present invention is a purified nucleic acid comprising at least 10 consecutive 
nucleotides of a sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and sequences complementary to the sequences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

30 [0049] Another embodiment of the present invention is a purified nucleic acid comprising at least 15 consecutive 
nucleotides of a sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and sequences complementary to the sequences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 
[0050] A further embodiment of the present invention is a purified nucleic acid comprising the coding sequence 
of a sequence selected from the group consisting of 24-4100. 

35 [0051] Yet another embodiment of the present invention is a purified nucleic acid comprising the full coding 
sequences of a sequence selected from the group consisting of SEQ ID NOs: 3721-3811 wherein the full coding 
sequence comprises the sequence encoding the signal peptide and the sequence encoding the mature protein. 
Still another embodiment of the present invention is a purified nucleic acid comprising a contiguous span of a 
sequence selected from the group consisting of SEQ ID NOs: 3721-381 1 which encodes the mature protein. 
[0052] Another embodiment of the present invention is a purified nucleic acid comprising a contiguous span of a 

40 sequence selected from the group consisting of SEQ ID NOs: 24-652 and 3721-3811 which encodes the signal 
peptide. 

[0053] Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising 
a sequence selected from the group consisting of the sequences of SEQ ID NOs: 4101-8177. 

[0054] Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising 
45 a sequence selected from the group consisting of the sequences of SEQ ID NOs: 7798-7888. 

[0055] Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising 
a mature protein included in a sequence selected from the group consisting of the sequences of SEQ ID NOs: 7798- 
7888. 

[0056] Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising 
a signal peptide included in a sequence selected from the group consisting of the sequences of SEQ ID NOs: 4101- 
50 4729 and 7798-7888. 

[0057] Another embodiment of the present invention is a purified nucleic acid at least 15,18, 20, 23, 25, 28, 30, 
35, 40, 50, 75, 100, 200, 300, 500 or 1000 nucleotides in length which hybridizes under stringent conditions to a 
sequence selected from the group consisting of SEQ ID NOs; 24-4100 and SEQ ID NOs: 8178-36681 and sequences 
complementary to the sequences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 
55 [0058] Another embodiment of the present invention is a purified or isolated polypeptide comprising a sequence 
selected from the group consisting of the sequences of SEQ !D NOs: 4101-8177. 

[0059] Another embodiment of the present invention is a purified or isolated polypeptide comprising a sequence 
selected from the group consisting of SEQ ID NOs: 7798-7888. 
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[0060] Another embodiment of the present invention is a purified or isolated polypeptide comprising a mature 
protein of a polypeptide selected from the group consisting of SEQ ID NOs: 7798-7888. 

[0061] ' Another embodiment of the present invention is a purified or isolated polypeptide comprising a signal 
peptide of a sequence selected from the group consisting of the polypeptides of SEQ ID NOs: 4101-4729 and 7798- 
7888. 

[0062] Another embodiment of the present invention is a purified or isolated polypeptide comprising at least 10 
consecutive amino acids of a sequence selected from the group consisting of the sequences of SEQ ID NOs: 4101- 
8177. 

[0063] Another embodiment of the present invention is a method of making a cDNA comprising the steps of 
contacting a collection of mRNA molecules from human cells with a primer comprising at least 15 consecutive 
10 nucleotides of a sequence selected from the group consisting of the sequences complementary to SEQ ID NOs: 24- 
4100 and SEQ ID NOs: 8178-36681, hybridizing said primer to an mRNA in said collection that encodes said protein 
reverse transcribing said hybridized primer to make a first cDNA strand from said mRNA, making a second cDNA strand 
complementary to said first cDNA strand and isolating the resulting cDNA encoding said protein comprising said first 
cDNA strand and said second cDNA strand. 

[0064] Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding 
15 paragraph. 

[0065] In one aspect of this embodiment, the cDNA encodes at least a portion of a human polypeptide. 

[0066] Another embodiment of the present invention is a method of making a cDNA comprising the steps of 

.... — . . . ,i , . a., j .x iu - ~t OCO m MO~- OA A i.OO -»«rl CCO IH 

ODtaining a cuinm comprising a sequence seieuieu nuui me yiuujj uunoiouny ui wl-w iu mwo. iw «uw v^v-* ,^ 
NOs: 8178-36681, contacting said cDNA with a detectable probe comprising at least 15 consecutive nucleotides of a 
20 sequence selected from the group consisting of SEQ ID NOs: 24^100 and SEQ ID NOs: 8178-36681 and the 
sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 under conditions which permit said 
probe to hybridize to said cDNA, identifying a cDNA which hybridizes to said detectable probe, and isolating said 
cDNA which hybridizes to said probe. 

[0067] Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding 
paragraph. 

25 [0068] In one aspect of this embodiment, the cDNA encodes at least a portion of a human polypeptide. 

[0069] Another embodiment of the present invention is a method of making a cDNA comprising the steps of 
contacting a collection of mRNA molecules from human cells with a first primer capable of hybridizing to the polyA 
tail of said mRNA, hybridizing said first primer to said polyA tail, reverse transcribing said mRNA to make a first 
cDNA strand, making a second cDNA strand complementary to said first cDNA strand using at least one primer 

30 comprising at least 15 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs: 24- 
4100 and SEQ ID NOs: 8178-36681, and isolating the resulting cDNA comprising said first cDNA strand and said 
second cDNA strand. 

[0070] Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding 
paragraph. 

[0071] In one aspect of this embodiment, said cDNA encodes at least a portion of a human polypeptide. 
35 [0072] In another aspect of the preceding method the second cDNA strand is made by contacting said first cDNA 
strand with a first pair of primers, said first pair of primers comprising a second primer comprising at least 15 
consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and a third primer having a sequence therein which is included within the sequence of said first primer, 
performing a first polymerase chain reaction with said first pair of primers to generate a first PCR product, 
contacting said first PCR product with a second pair of primers, said second pair of primers comprising a fourth 
primer, said fourth primer comprising at least 15 consecutive nucleotides of said sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, and a fifth primer, wherein said fourth and fifth 
hybridize to sequences within said first PCR product, and performing a second polymerase chain reaction, thereby 
generating a second PCR product 

[0073] One aspect of this embodiment is a purified cDNA obtainable by the method of the preceding paragraph. 
45 [0074] In another aspect of this embodiment, said cDNA encodes at least a portion of a human polypeptide. 

[0075] Alternatively, the second cDNA strand may be made by contacting said first cDNA strand with a second 
primer comprising at least 15 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681, hybridizing said second primer to said first strand cDNA, and extending 
said hybridized second primer to generate said second cDNA strand. 

[0076] One aspect of the above embodiment is a purified cDNA obtainable by the method of the preceding 
paragraph. 

[0077] In a further aspect of this embodiment said cDNA encodes at least a portion of a human polypeptide. 
[0078] Another embodiment of the present invention is a method of making a polypeptide comprising the steps of 
obtaining a cDNA which encodes a polypeptide encoded by a nucleic acid comprising a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 or a cDNA which encodes a polypeptide comprising at least 10 consecutive 
55 amino acids of a polypeptide encoded by a sequence selected from the group consisting of SEQ ID NOs: 24-4100, 
inserting said cDNA in an expression vector such that said cDNA is operabiy linked to a promoter, introducing said 
expression vector into a host cell whereby said host cell produces the protein encoded by said cDNA, and isolating 
said protein. 
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[0079] Another aspect of this embodiment is an isolated protein obtainable by the method of the preceding 
paragraph. 

[0080] Another embodiment of the present invention is a method of obtaining a promoter DNA comprising the steps 
of obtaining genomic DNA located upstream of a nucleic acid comprising a sequence selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and the sequences complementary to the sequences of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, screening said genomic DNA to identify a promoter capable of 
directing transcription initiation, and 
isolating said DNA comprising said identified promoter. 

[0081] In one aspect of this embodiment, said obtaining step comprises walking from genomic DNA comprising a 
sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and the 
sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID NOs; 8178-36681. in another aspect of this 
embodiment, said screening step comprises inserting genomic DNA located upstream of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and the sequences complementary to SEQ ID 
NOs: 24-4100 and SEQ ID NOs. 8178-36681 into a promoter reporter vector. For example, said screening step may 
comprise identifying motifs in genomic DNA located upstream of a sequence selected from the group consisting of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and the sequences complementary to SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681 which are transcription factor binding sites or transcription start sites. 

[0082] Another embodiment of the present invention is a isolated promoter obtainable by the method of the 
paragraph above. 

Another embodiment of the present invention is the inclusion of at least one sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, the sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and fragments comprising at least 15 consecutive nucleotides of 
said sequence in an array of discrete ESTs or fragments thereof of at least 15 nucleotides in length. In some 
aspects of this embodiment, the array includes at least two sequences selected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681, the sequences complementary to the sequences of SEQ ID NOs: 24- 
4100 and SEQ ID NOs: 8178-36681, and fragments comprising at least 15 consecutive nucleotides of said sequences. 
In another aspect of this embodiment, the array includes at least five sequences selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, the sequences complementary to the sequences of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and fragments comprising at least 15 consecutive nucleotides of said 
sequences. 

[0083] Another embodiment of the present invention is an enriched population of recombinant nucleic acids, said 
recombinant nucleic acids comprising an insert nucleic acid and a backbone nucleic acid, wherein at least 5% of said 
insert nucleic acids in said population comprise a sequence selected from the group consisting of SEQ ID NOs: 24- 
4100 and SEQ ID NOs: 8178-36681 and the sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0084] Another embodiment of the present invention is a purified or isolated antibody capable of specifically 
binding to a polypeptide comprising a sequence selected from the group consisting of SEQ ID NOs: 4101-8177. 
A purified or isolated antibody capable of specifically binding to a polypeptide comprising at least 10 consecutive 
amino acids of a sequence selected from the group consisting of SEQ ID NOs: 4101-8177. 

^An antibody composition capable of selectively binding to an epitope-containing fragment of a polypeptide comprising 
a contiguous span of at least 8 amino acids of any of SEQ ID NOs: 4101-8177, wherein said antibody is polyclonal or 
monoclonal. 

[0085] Another embodiment of the present invention is a computer readable medium having stored thereon a 
sequence selected from the group consisting of a nucleic acid code of SEQ ID NOs: 24-4100 and 8178-36681 and a 
polypeptide code of SEQ ID NOs: 4101-8177. 

[0086] Another embodiment of the present invention is a computer system comprising a processor and a data 
storage device wherein said data storage device has stored thereon a sequence selected from the group consisting of 
a nucleic acid code of SEQID NOs: 24-4100 and 8178-36681 and a polypeptide code of SEQ ID NOs: 4101-8177. In 
one aspect of this embodiment the computer system further comprises a sequence comparer and a data storage device 
having reference sequences stored thereon. For example, the sequence comparer may comprise a computer program 
which indicates polymorphisms. 

In another aspect of this embodiment, the computer system further comprises an identifier which identifies features 
in said sequence. 

[0087] Another embodiment of the present invention is a method for comparing a first sequence to a reference 
sequence wherein said first sequence is selected from the group consisting of a nucleic acid code of SEQID NOs: 24- 
4100 and 8178-36681 and a polypeptide code of SEQ ID NOs: 4101-8177 comprising the steps of reading said first 
sequence and said reference sequence through use of a computer program which compares sequences and 
determining differences between said first sequence and said reference sequence with said computer program. In some 
aspects of this embodiment, said step of determining differences between the first sequence and the reference 
sequence comprises identifying polymorphisms. 

[0088] Another embodiment of the present invention is a method for identifying a feature in a sequence selected 
from the group consisting of a nucleic acid code of SEQID NOs: 24-4100 and 8178-36681 and a polypeptide code of 
SEQ ID NOs: 4101-8177 comprising the steps of reading said sequence through the use of a computer program which 
identifies features in sequences and identifying features in said sequence with said computer program. 
[0089] Another embodiment of the present invention is a vector comprising a nucleic acid according to any one of 
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the nucleic acids described above. 

[0090] Another embodiment of the present invention is a host cell containing the above vector 

[0091] Another embodiment of the present invention is a method of making any of the nucleic acids described 

fnZ^ZTr!^ 9 ' the ! te t PS *° f ^H^O said nucleic acid int0 a nost celi such that said nucleic acid is present 
in multiple copies in each host cell and isolating said nucleic acid from said host cell. 

[0092] Another embodiment of the present invention is a method of making a nucleic acid of any of the nucleic 
fn^i d ab0VS COmprising the step of ^q^ntially linking together the nucleotides in said nucleic acids 

[0093] Another embodiment of the present invention is a method of making any of the polypeptides described 
S e :^^ 3CidS in ,en * h « less C -P™"9 step of sequel,, HnKing 

[0094] Another embodiment of the present invention is a method of making any of the polypeptides described 
?nZl^? em Po'yPeP^es is 120 amino acids in length or less comprising the step of sequentially linking 
together the amino acids in said polypeptides. y 3 

Brief Description of the Sequence Listing 

Kb! 2I2 S IS* I' !■ I'l' 9 ' 1 1 ' and 13 are fulWength cDNAs prepared usin 9 the meth ° ds described herein. 

\ * V a « < J?, 2 ' 4 ' 6 ' 8 ' 1 0l 1 2 ' and 1 4 are the P^PepWes encoded by the nucleic acids of SEQ ID NOs 1 
■}, o, /, a, n, and 13. ' 

[0097] SEQ ID NOs: 15, 16, 18, 19, 21 and 22 are.Drimers whose use is ri«srrihpH in th B .cn^m^**,, 

E!L h SE ° '£ 17 : 2 °' and 23 are the sec > uences of nucleic aad s containing transcription factor binding 
sites which were obtained as described below. 

[0099] SEQ ID NOs: 24-652 are nucleic acids having an incomplete ORF which encodes a signal peptide As used 

h^ e h;riHoS le Th°^ F 'i S an °JT reading frame ir ' which a start codon has been identif ied but no stop codon 
has been identified. The locaions of the incomplete ORFs and sequences encoding signal peptides are listed in the 

? 6qUe ? C6 hf tin9 additi ° n ' * e VOn He « ne SC0fe of the si 9" al P e P«de computed a?de^crSed below 
s listed as the score in the accompanying Sequence Listing. The sequence of the signal-peptide is listed as "seq" 

oroSolX ^^r^th qU8 T e L t ' S J in9 The 4 ' 7 ' in the Si9nal peptide se ^ ence indicates location where 
proteolytic cleavage of the signal peptide occurs to generate a mature protein. 

[0100] SEQ ID NOs: 653-3720 are nucleic acids having an incomplete ORF in which no sequence encodinq a 
signal peptide has been identified to date. However, it remains possible that subsequent analysis will identify a 

s^^iSffir 1 " in these nucie,c acids - The iocations ° f the incomp,ete ° rfs are ,isted in the 

[0101] ^ SEQ ID NOs: 3721-381 1 are nucleic acids having a complete ORF which encodes a signal peptide As used 
2™ o C f°thf open reading frame in which a start codon and a stop codon have been identified The 

t! „,!l P OR u aPd se ^ uences encodin 9 signal peptides are listed in the accompanying Sequence 
Listing. In addition, the von Heyne score of the signal peptide computed as described below is lilted as the 
score m the accompanying Sequence Listing. The sequence of the signal-peptide is listed as "seq" in the 

a | ea X a TZdn^rn Ce «H IStln9 - Ti l B T in . the Signa ' Peptide sequence indicates the location where proteolytic 
cleavage of the signal peptide occurs to generate a mature protein. 

EUw h S k ° ' D „ NO f; I 812 "! 1 00 afe nUdeiC 3Cids having a complete 0RF in which no sequence encoding a signal 
SSS.Jf J^TJ 1°. • H , 0Wever " * remains possible that subsequent analysis will identify a sequence 

Sequence Luting P P 86 aC ' dS - Th6 toCa6 °" S ° f the COmplete 0RFs are ,is,ed in Xhe Accompanying 

[0103] SEQ ID NOs: 4101-4729 are "incomplete polypeptide sequences" which include a signal peptide 
^TiSL e n« fi ^ P ? . SeqUe w C6 ^' 3r ! po| y peptide sequences encoded by nucleic acids in which a start codon has 
S« tk ?° S ° P C °, d .°, n h3S b f en identified - These P Q| yP e P«des are encoded by the nucleic acids of SEQ ID 
HeL £™Ln£ 2tZ?«H e S,9nal , P h P tld ! iS ' iSted in the accompanying Sequence Listing. In addition, the von 
1 12™ Th s.gna peptide computed as described below is listed as the "score" in the accompanying Sequence 

^« ?■ «, ' sequence signal-peptide is listed as "seq" in the accompanying Sequence Listing The r in the 
mafure P p e S?n. Se<,UenCe ^ ^ C ' eaVaQe * the' signal ^sptide occu?s to generate a 

EfL ♦ S J? ' D u N0S: 47 ?°- 7797 are incomplete polypeptide sequences in which no signal peptide has been 
tk Ho r ever ;.J t remains Possible that subsequent analysis will identify a signal peptide in these 
polypeptides. These polypeptides are encoded by the nucleic acids of SEQ ID NOs- 653-3720 
[0105] SEQ ID NOs: 7798-7888 are "complete polypeptide sequences" which include a signal peptide "Complete 
S^Tni^ZT^ ™ po ' ypef * de sequences encoded by nucleic acids in which a start codon and a stop codon 
have been identified. These polypeptides are encoded by the nucleic acids of SEQ ID NOs: 3721-3811 The location of 
d^IKJ ls ^ ted . '"^ accompanying Sequence Listing. In addition, the von Heijne score ■ olheSna 
peptide computed as described below ,s listed as the "score" in the accompanying Sequence Listing The sequence of 

IS d 35 " SSq " in ,he accom P a "y in 9 Sequence Listing. The T in the signal peptide sequence 
indicates the location where proteolytic cleavage of the signal peptide occurs to generate a mature orotein q 

idpniL t „ 'u N ° S: 78 t 89 - 817 . 7 are com P ,ete Polypeptide sequences in which no signal peptide has been 
identified to date. However, it remains possible that subsequent analysis will identify a signal peptide in these 



-10 



EP 1 033 401 A2 

po ypepbdes. These polypeptides are encoded by the nucleic acids of SEQ ID NOs.3812-4100 
idenMea o, »e unknown amino ^Z^^^^L'!^ ^ 

• Brief Description of the Drawing 

mm rS"! o SUmmari2 f S ,he 5° mpUt6r analySis pr0Cedure for obtai ™9 consensus contigated ESTs 

described herein P ^ US,ng the techniques for signal peptide identification 

[01 1 1] Figure 3 illustrates methods for making extended cDNAs 

SelolresponSs 4 fiT" * ° f thS Pr ° m ° ters isolated and *° «-y are assemb,ed w*h 

[01 13] Figure 5 describes the transcription factor binding sites present in each of these promoters. 
Detailed Description of th e Preferred Rmbodimant 

I. General Methods for Obtaining 5' ESTs derived from mRNAs with intact 5" ends 

a 1 bisaas x^ssr"* mMn - mRNAs *>• ■** 5 «* «» — 

EXAMPLE 1 



Preparation of mRNA 



coning proced,™ FolSg^hlTot iX *tSK°S ftSVXt. Z^l *,"' e, * ln " ,M «" 
examined by performing a Northern tZ UhllfTsr ? =5! RNA - lh ' ln,e ° nl >' " the mRNA «•» 
o« 9 on„=,eo,d, P ,, g before p^ <«*» <o .ho 

EXAMPLE 2 

cDNA Synthesis Using mRN A Templates Ha ving Intact fi' PnHc 
^iinoh*^ 

proWWornaiE^i^inVo^^osS SSSSTdSio^S SeC ° n< ' S " an " °""» " 

B120J Folowing oONA syn,hesi 5 . the cDNAs were eloned into pBlueScrip, as deeonbed h Example 3 below. 



-11- 



EP 1 033 401 A2 

EXAMPLE 3 hw*uim* 



Cloning of cDNAs derived from mRNA wit h intact 5' ends into BlueScrf pt 

r,^ ,,0Wing SGCOnd strand s V ntnesis . ends of the cDNA were blunted with T4 DNA oolvmerm minbhet 

[0122J Clones containing the oligonucleotide tag attached were then selected as described in Example 4 below. 
EXAMPLE 4 

Selection . Of Clones Havino the Olioonurtent i de Tao Artanhpri T h a r afn 

[0123] The plasmid DNAs containing 5" EST libraries made as described above were Durified fQiaoenl A nno.w 

st.sk: p r fied r? 8 ^ efc beads as describ " ed w"fv i^sss^ w'l'^ar sst: 

rami pi? ° l ! 90nU | cle ° tde was eMed t0 Vpically rank between 90 and 98% using dot blot analysis 

[0124] Following electropo ration, the libraries were ordered in 384-microftter dates fMTPl a ZI * mtd 

was stored for future needs. Then the libraries were transferred into 96 MTP and lequ^ 



EXAMPLE 5 

Sequencing of Inserts in S elected PJnp p* 



SnS„ri„« ♦ P Were then se ^ enced usin 3 automatic ABI Prism 377 sequencers (Perkin Elmer) 

' an " """^ "*» < he ABI P ™ ™* "ncing 

EXAMPLE 6 

Obtaining 5' ESTs from Fnll-lennrh cDNA lihraripc r e tained from mPMA witn , ntart «■■ FnHe 

El?/* ■ Alterna ![ ve| y' 5 ' ESTs ma V be isolated from other cDNA or genomic DNA libraries Such cDNA or aenomir 
El h ? a r d synthesis is subsequently carried out for mRNAs joined to the oligonucleotide taa k 
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flagged suspect peaks, taking into EounTtte shaoT of L Sf*' ^l^ 9 ^ 3 Unix s * stem - automatically 
The proprietary base-caller also performed an automatic IK «\'Tf£ k r f° lution ' and the noise '^el 
4 suspect peaks was considered'unreliaot andtT 5^^^^.^ baSeS ha ™9 ™ e 
unyonucieonaes were automatically removed from the P<?T 'Unn^ITu" """"^""'"a i" cioning vector or ligation 
contain 1 to 5 bases belonging to the above , Zn *L frt Ln? q wu 0 ^' the resultin 9 EST sequences may 
removed on a case to case basis mentioned sequences at the.r 5' end. If needed, these can easily be 

datable tJ^Z?£^13S£i^ TJ". * 5 " ESTs «" NetGene™. a 

the NetGene™ database cefof p«t ^ d « ep ' Cted in F, ' 9ure 1 Before ^ s ^9 the ESTs in 

endogenous or exogenou^S whi h ch k were "« of interest, such as 

repeated sequences were identified and eliminated from further consideration UenC6S ' ^ ^ de Q e nerate sequences, or 

lelecin *es^L?T^ P??*~ - «* as the efficiency of the 5' 

obtained from NetGene™ database fottngSSi^ on 5'ESTs 

EXAMPLE 7 

Measurement of Seguencinn Accuracy bv r.nm p a f j c. n to Known s oq , |o rr ~ 

SSL.™ ?E?i e detTdt? k ;o a wn 'S^'SA^ " ^ 5 ' the ^™<" ° f 

First, a FASTA analysis with ovJ*an^h;SSaT5^onff»H a . nd COmf Tl ^ °" 9inal known sequences, 
matching an entry in the public h^^SSiS^S^ T&TJSSXSXX V * T* l ° th ° Se 
. then realigned with their cognate mRNA and dmir'n^mi 5 matched a known human mRNAwere 

deletions in the list of "errors" which would t SZS SZ^ used t0 in ^de substitutions, insertions, and 

EXAMPLE 8 

Determination of EffidRpry Q f 5 1 FSJ ^[^trn 

Cic-ckl^ P-edures iso.ated 5' ESTs which included 

from the elongation factor 1 zi^^^^^ZS'J^ Se < Uence * of < he e "<* of the 5' ESTs derived 
these genes. Since the transcription start srtes oS ™l fl 2L were „ C ° mpared to the known cDNA sequences of 
the percentage of derived 5" ES^s ISI^^E ffnS^SSS ^ * ^ * 

of 2?* JZt?cS£ZSZ5£ % ° f ^ ° b,ained 5> ESTs sequences Cose to or upstream 

KL™ dI?ab e £ n VSmi. a ar a aSysiI SaVclSed uL^T «" M ° * ESTs f ™ ESTs in the 
^actedfromGenBankdataK^^fer^S^ l 0 human mRNA sequences 

mRNAs included in the GeneBank M^w^taS^^ LVtSL ^ 85% ° f 5 ' ESTs derived from 
mRNA sequences available in the GenBankStaS«^S,2if^ m end8 ° f the kn0Wn se W e ™ As some of the 
these sequences will be counted as a" internal match itX ^lTJ^ZT sea - uences . 3 5' end matching with 
including the authentic 5' ends d"J^wnS!^S£- US6d h6re underes ^ates the yield of ESTs 
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EXAMPLE 9 

Clustering of the 5' ESTs 

[0141] Since the cDNA libraries made above include multiple 5' ESTs derived from the same mRNA, overlapping 
5'ESTs may be assembled into continuous sequences. The following method (see Figure 1) describes how to efficiently 
cluster 5'ESTs in order to yield not only consensus 5'EST sequences for mRNAs derived from different genes but also 
consensus 5'EST sequences for different m RIM As, so called variants, transcribed from the same gene such as 
alternatively spliced mRNAs. This clustering was performed on a set of NetGene™ 5'ESTs sequences following 
elimination of endogenous contaminants, elimination of uninformative sequences and masking of repeats. 
[0142] The whole set of sequences was first partitioned into smaller sets, so-called clusters, containing 
sequences exhibiting perfect matches with each other on a given length. Such clusters contain 5'ESTs derived from a 
small number of different genes. Some 5'EST sequences were not clustered using this approach either because they 
were not homologous to any other sequence or because the homology was not properly detected. To overcome this 
problem, sequences not clustered, so called singletons, may be compared to the consensus contigated ESTs obtained 
later on and, if necessary, included in the appropriate clusters and used to compute other consensus contigated ESTs. 
[0143] Thereafter, all variants of a given gene were identified in each cluster as follows. Overlapping 
sequences inside a given cluster were figured as oriented graphs where each sequence was a node and each overlap 
an edge. Then, the different genes contained within a single graph which were represented by different connex 

, „ ..w,w lit/in ca^n umci. ouuocHucnuy, u us unieieni vananra or a same gene were 

isolated using an algorithm based on the detection of forks within a connex component. If desired, the consensus 
contigated EST sequences may be verified by identifying clones in nucleic acid samples derived from biological 
tissues, such as cDNA libraries, which hybridize to the probes based on the sequences of the consensus contigated 
ESTs and sequencing them. 

[0144] Overlapping 5'EST sequences belonging to the same variant as well as included 5'EST sequences 
belonging to the same cluster were then contigated and consensus contigated 5'EST sequences were generated for 
each variant. Some of the obtained consensus contigated 5'EST sequences were incomplete due to the fact that only ~ 
included and overlapping 5'EST sequences were considered to isolate genes and due to the algorithm developed to find 
variants. These variant consensus contigated 5'EST sequences were extended as follows. Variants transcribed from the 
same gene were compared pairwise and the 5' EST consensus sequences that were incomplete either in 5' and/or in 3' 
were extended with the appropriate sequence from the other variants. All 5' EST consensus sequences eventually 
completed in 5' or 3' from each cluster were subsequently compared to the whole set of individual 5'EST sequences 
obtained for this cluster. 

EXAMPLE 10 

Ide ntification of the Most Probable Open Reading Frame of 5' ESTs 

[0145] Subsequently, the most probable coding open reading frame (ORF) may be determined for each consensus 
assembled 5'EST or 5'EST as follows. 

[0146] Each nucleic acid sequence is first divided into several subsequences which coding propensity is 
evaluated using different methods known to those skilled in the art such as the evaluation of N-mer frequency and 
its variants (Fickett and Tung, Nucleic Acids Res;20;6441-50 (1992)) or the Average Mutual Information method (Grosse 
etaf, International Conference on Intelligent Systems for Molecular Biology, Montreal, Canada. June 28-July 1, 1998). 
Each of the scores obtained by the techniques described above are then normalized by their distribution extremities 
and then fused using a neural network into a unique score that represents the coding probability of a given 
subsequence. 

[0147] The coding probability scores obtained for each subsequence, thus the probability score profiles obtained 
for each reading frame, are then linked to the initiation codons present on the sequence. For each open reading 
frame, defined as a nucleic acid sequence of at least 50 nucleotides beginning with an ATG codon, an ORF score is 
determined. Basically, this score is the sum of the probability scores computed for each subsequence corresponding 
to the considered ORF in the correct reading frame corrected by a function that negatively ponderates locally high 
score values and positively ponderates sustained high score values. The chosen ORF is the one with the hiqhest 
score. 9 

[0148] Two kinds of ORFs are considered. In some embodiments, 5'ESTs encoding ORFs of at least 50 amino 
acids extending up to the end of the consensus assembled 5'EST sequences are obtained. In other embodiments 
5 ESTs encoding complete ORFs, namely ORFs with start and stop codons, containing at least 100 amino acids are 
obtained. 

EXAMPLE 11 

Sequence Analysis 

[0149] Application of the clustering method described in Example 9 to a selected set of 126,735 NetGene™ 5'ESTs 
free from endogenous contaminants and uninformative sequences yielded 9490 consensus assembled 5'EST 
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rnnt^HV/^ f0f 3 ,0ta ' 5 8037 9 enes clustere d representing 98.973 individual 5'ESTs. One of them which 
£23 froiShrrS 3 ^ Sh ° Wn t0 C ° ntain ChimeraS MS 40 C ° mpariSOn t0 PUbliC -P^cestas 
2 . Both non clustered 5'ESTs, i.e. singletons, and consensus contigated 5'ESTs were then compared to 
t^iS^^T* 35 follows , Thos , e se « uences matching human mRNA sequences were eliminated from 
Sov.fpH ht «, ' f f ll0W ' ng m f Skmg ° f repeats those ««l"«ices matching sequences that have already been 
dscovered by the nventors, namely sequences exhibiting more than 90% homology over stretches longer ^an 40 
fina set fLS^f 2N ™* ove * an 9 sshort * r than 1 0 nucleotides, were removed from further £o T 
rnninJt h £cl? f ^LIT?™** ° f * s ,nVentron < SEQ 10 NOs:24-4100 and 8178-36681)./.*. 7609 consensus 
contigated 5 EST from 6398 clusters containing 31.267 5'ESTs and 24, 972 singletons consensus 

[0151] Of the 6398 obtained clusters. 658 were shown to be multivariant, i.e. to contain several variants of the 

ESthTc^lJ. 8 ^ f ° r "J*.* th6 mUffiVanant Clusters named b V rts inte mal reference^ cZn) he 
st of he consensus sequences of all variants, each variant being represented by a different SEQ ID NO 

[0152] Subsequently, the most probable open reading frame was determined, as described in Example 10 for all 
3 ' 69? 5 'P^ { f £Q ,D NOs:24 - 372 °) ending incomplete ORFs (SEQ ID N0^41 01-7797) 

(LeoZ ^NO S 77Q« «1^7 "I 9 ,T re /?nn d - addifon ' 380 5 ' ESTs (SEQ ID NOs:3721-4100) encoding complete ORFs 
(SEQ ID NOs.7798-81 77) of at least 100 amino acids were found H 

Sed bv^Eo"^^^^^ 0 ' ^ SEQ ' D , N ° S: 24 - 41 °° and 8178 " 3668 1 a " d the amino acid sequences 
encoded by SEQ ID NOs. 24-4100 i.e. ammo acid sequences of SEQ ID NOs- 4101-8177) are Drovided in the 

may contain "Xa^ Ve^Ztt ?£? 

a'iZnn^^ZrT; v ' • ° ^ ,uuc VVM ^" ^ nr,ot De 'aenimea Decause or nucleotide sequence ambiguity or (2) 

iSlSllSZjSm^ SeQUenCe ^ ° M M *• S '* U '™ wS 

nnl 5 ?r m Jl ^ °' ^ ""^ add sec " jences of SEC > '° NOs: 24-4100 and 8178-36681 are suspected of containing 
one or more incorrect or ambiguous nucleotides, the amoiguities can readily be resolved by resequence a fraoment 
containing the nucleotides to be evaluated. If one or more incorrect or ambiguous nuloS a^fdetec e^the 

other consensus contigated sequences on which other ORFs would be identified. Nucleic acid fragments for So 

described herein. Resolution of any such ambiguities or errors may be facilitated by using primers which hv2e 

InZJZt h 5 °' 7 w baS l S ° f the ambiguity or error U P° n resolution of a " error or ambiguity theTorrespondinq 
SS w ,« n ? ( h ma *! n, ' ,e Pr T, n ! eqUenC6S enC0ded ^ the DNA containin 9 the ^ror or ambiguity The am no 

S^wfh » . .? P / a u en ° 0ded by a par1iciJlar clone can also be determined by expression of the ctonein a 
suitable host cell, collecting the protein, and determining its sequence f»»»on or ine cione in a 

0°RF 51 as tha'^Tof" 'J'f^ °h5' SeqUences of SEQ 10 NOs: 4101-8177 is suspected of containing an truncated 
£hLm *1 u fra ™ shl « ,n the sequence, such frameshifting errors may be corrected by combining the 

Rowing two approaches. The first one involves thorough examination of all double predictions i e aH case! 
where he probability scores for two ORFs located on different reading frames are high and dose C S 
2SS th?lSJSS? ?nY he fine ,f amina «° n ° f the ^gion where the'two possible ORFs ov^eriap may help to 
frameshifts. ffamesh ' ft - ln the second approach homologies with known proteins are used to correct suspected 

EXAMPLE 12 

Identification of Poten tial Signal Sequences in 5' ESTs 

^ 5 nLhtL h !H a f min f 0 acid , se ^ uences of SE0 - ID NOs: 4101-8177 were then searched to identify potential signal motifs 
us,ng slight modifications of the procedures disclosed in Von Heijne, Nucleic Acids Res M 4683-4690 1986 Those 

Tlrlll enC °t d,n9 3 15 ami "? add 1009 Stretch with a score of at least 3 -5 in the Von Hei/ne "'signal pljtide 
tSSSSs^* ^ C ° nS,dered t0 P ° SSeSS 3 Si9na ' Sequence and were inc,uded in a daSase caled 

S7 1 /i 7 ?i«ii»InH^ qUenCeS ° f I! 16 ?2 °, " UCleiC add s *W ences containing a signal sequence (SEQ ID NOs:24-652 and 
3721-3811) and the corresponding polypeptides with a potential signal peptide (SEQ ID NO 4101-4729 and 7798 788A> 
are provided I ,n the Sequence Listing appended hereto. The signal peptides of such ^peptides ^are indicated as 
features ,n the appended Sequence Usting. It should be noted I that , i , accordance v^TnSlatons^SSir^ 
Ln q HH fi nC a nH L fH tin9S \ in the a .PP ended ^ence Listing, the full protein (i.e. the protein cSnS thTSnd 
^hLw rl hS mat , Ure Pr ° tein) extends from an amino acid residue having a negative number through a oosiC 
numbered C-terminal ammo acd residue. Thus, the first amino acid of the mature protein resulting S ^ cleavage^ 

^^^^^S^^ d 1 - and *• - ^ acid - -V-ISKJ 

was 5 p , erformed COnfirm ^ a ° CUraCy ° f ^ ab ° VS m6th ° d ^ ,denX ^ si 9 nal sequences, the analysis of Example 13 
EXAMPLE 13 
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Confirmation of Accuracy of Identification nf Pnt ent j a | Siflna | Sgf1uences in s . FRTg 

higher than 3.5 (false < t^^^h^Z^ St^ r> ^ "T^ ° f non " secreted P roteins having a scorf 
could be calculated ' b6f ° f SeCfeted prote,ns havln 9 a score low er than 3.5 (false negatives) 

525 is infcc^nfs^ 3 P ^ ^ * »• 5 ' ^ * ». 

assumption that 10% 9 O f human^L^ based - *h« the 

results of this analysis are shown in Figure 2 assumption that 20 A of human proteins are secreted. The 

cyclophilin-like protein human nieiotronin anH E 9 „ k' ? -T interferon '"duced monokine precursor, secreted 
^Ll^%^.^Jl^!^^ b,0t,nidaSe Pr6CUrS0r Thus ' the above ^"essfully 

containing signa seauence selectinn v/P^tnre vA/;th fh« - "!T'_,vrS o.ooo.od/. browth of host ce s 

25ST f 4 e 5 ' ^^^^^^^^^ w consensus 5 ' EST si9nal sequence 

promoter-signa? sequence ^ as P™. as described below, or by constructing 

assayable reporter protan. ^.SZ^^^J^J^J^ the signal peptide and an 

™DNA insert 
EXAMPLE 14 



Assessme nt of the nov elty rate of fi'FSTg 
[0164] 



5'ESTs were considered unidentified. '°enmied. Thus, about 90% of SESTs or consensus assembled 

".^Evaluation of Spatial and Temporal Expression of mRNAs Corresponding to the 5'ESTs or Extended 

EXAMPLE 15 

Expression Patterns of mRNAs From Whir-h tt, f v F ST% wpra „ htaipo1 

[0166] Table II shows the spatial distribution of each of the 5'ESTs fnon ri.i^prpH f<?t«=n , 
contigated ESTs respectively. Table II provides the SEQ ID NOs of 'fh£ 5' 1^?^ ES L 8 l an * * ®! ch consensus 
non-clustered ESTs or singletons) and consensus contaated EST* t*Z i. ^ ? ^ t0 alternat,ve,v herei " » 
type of tissue which were used to asseTbTth SSnl = ^ " ateo hsts the number of ESTs from each 
contain a single 5' EST from a sinqle tissue af e ? £ on * e " sus E , STs - The SEQ ID NOs: in Table II which 

letter. The correspondent > b^^^^l^ E !S■^t^ e f U a, ^ ? Tab ' 6 " is enCoded b * a 

g^r^ 
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5 rf 38 T" aS , th6ir eXpreSSi °" leve ^ ^y^deJefm^ned as described in Example 16 below 

SSL cSSSJS^SIl^MK STL?,"? t~5&*%2 ^ is 

or tempera, manner, as will be discussed* more de ta5 Mow 9 "** ^ ° f 9606 Pr ° dUCt in a desired s P atial 

5 Se statS^To be'SS ^ampS^cLf ? Wh0Se co ^ ond ^ "RNAs are associated wfth 
expression, or under e^^^J^^^^^ 6 ^, ™* result fr ™ the lack of expression, over 
mRNA expression patterns and quaXs 

• asr a partcuiar 5 ' ^^^^^~^-z 

" census iSeSV^ for , ESTs and 
adjacent to the 5' ESTs and conVensu^ 

character^ may be delayed unfleS^cD^ f fu S ° be a PP recia,ed that if desired, 

consensus contigated 5' ESTs memselves " ° bta ' ned rather than cha ™terizing the 5" ESTs o 
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EXAMPLE 16 



E va l uation of Fv pr^ ion I r vr h n nn P mm , „ f ■iiRNA^m 

[0171] Expression levels and patterns of mRNAs corresnnnriinn t„ bst.^ =. ... . 

2 soiuuon nybndization with long probes as described in \^m^Jo^' ♦'T"? 'T'*"' may De analyzed by 
an EST-related nucleic acid, fragment of an i EST related nSS d ™ r Appl,cat,on No ' W0 97/05277. Briefly, 
acid, or fragment of a positional segment o an EST-related H posltlona ' se 9 ment of an EST-related nucleic 

to be characterized is inserted atl SnSj T^ln^^^^JJT'^? ^ ^ e e ^ n ^ e ^ 
polymerase promoter to produce antisenseRNA P^&ff^^^^ * i ^ iterK ^f ae (T3 ' T7 or SP6 ) RNA 
nucleic acid, positional segment of an EST related 3 !L f d nUCl f° acid ' fra 9 me nt of an EST related 

25 related nucleic acid is 100 or more n^cleoVdesl ^^Th Thfnlf h 39 ? 6 ^ ° f I P ° sitional se 9 ment of a " EST- 
of ribonucleotides comprising m^mS^£^^^^ ""^ transcribed in the presence 
labeled RNA is hybridized i , solutio wK mRNA ^att%S? «, »- d DIQ -V TP) An excess of this ^ubly 
performed under standard stringent ^^ii^^^^^^T ° f '" terest The hyMcflzallons are 
The unhybridized probe is removed Iw dioSn w» i rfh™ J? 3 r 8 ? % formam| de, 0.4 M NaCI buffer, pH 7-8). 

T1, Phy M, 02 or A) ^S^SS^^^^^T SpSClfic f ° r sin 9le-stranded RNA (i.e. RNases CL3 

30 Plate coated wrth lept id n The pr sence° oMne dS^SLS*'" V,? * a hybrid °" 3 ™'°«raton 
= edby ^ the hybrid to be detected and 

related ^^^7;^^ ^J^S^ « « 

nucleotide sequences for the serial analysis otm?™J*^(s££i^ T . -"fL"*? a ' S ° be ta " ed wrth 
305 241 A. In this method, cDNAs are oreoared from f^LT GE) 35 dsclosed ,n UK Patent Application No. 2 
which gene expression patterns Tmust be ^etem^^. «^n?!!ntl A 0r9 " n, ' m °' ° th6r S0UrCe of nuc,eic acid for 
each pool are cleaved with a fi™VeSc^d*^^^ ThecDNAsin 
which is likely to be present at least once in mos t cDNA ! ta„m-nf k" h 9 hav,n9 a reco 9ni«on site 

the cleaved cDNA are isolated by binding to a cam.J IS „ Wh ' Ch °° n,a,n the 5 ' or 3 ' most re 9 ion of 
oligonucleotide linker having a first LqueTce or hybKo Tof an amnL^ StFePtaVidin , COat6d beads ' A first 
site for a so called tagging endonuclease is feata In thl ^LT/ ^ma 3 " 0 " pnmer and an intefnal restriction 
second endonuclease ices short ^gmX^e^Mto^ *"* P °°' Digesbon wrth the 

S, restrfctirsite" if Sed to't Sed ^7" ^ ^'f ^ ° f ™ *™ and a " 

are also digested with the SgSg endo„u^?a^f,hn^ nd f P °° L J ?* cDNA fra 9 mente " the second pool 
pool. The tags resulting from digestio S Te L Jnc ^ second S^*^ fr ° m ,he cDNAs in the second 

endonuclease are ligated to on'e another to Sduce so called d taos In so'me 7? *5 ^ 

concatamenzed to produce ligation products containinc Z o fn ?nn !,l 9 1 . 6 ernbodln ients, the ditags are 
and compared to the sequences of the i EST-Sted nuS^ 1 w f f ' 7 he X ? 9 se ^ uence s ™ then determined 
segment of an EST-related nZ^ ^^I^^^J^^ ° f a " f related "^leic acid, positional 
determine which 5" ESTs, contigatec ^consensus 9 5 f ESTs o^nl^^ ° f ™ EST - re,ated nucleic acid to 
organism, or other source of nucleic acl* Lhirh f*- t^l? * ended . c D NA s are expressed in the cell, tissue, 

the S- ESTs, configated consensu^ 5' KT or eSded cDNAs'in fhTS ? *" ^ th6 eXPreSSi ° n P3ttern of 
nucleic acids is obtained. exienaed cDNAs in the cell, tissue, organism, or other source of 

a^Lans^^ P^ormed using arrays. As used herein, the term 

fragments of EST related nuctec acid^ f posttona 1 ^^m^^" 11 -? ES ]-' e ^ d ^ a °ds, 
segments of EST-related nucleic acids PreferabK T the ^eIt r.rln n, f nucleic aads, or fragments of positional 
acids, positional segments EST-related nude c aci£ or I»™*1*? &C fra 9 mente of EST related nudeic 
acids are at least 15 nucleotide! in lengS ™M^vJj^T *8?^ se 9 m ^nts of EST-related nudeic 
related nucieic acids, positional segments EST rJLti nZ^'J^ EST ; related nucleic acids, fragments of EST 
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in length. Jn some embodiments, the EST-related nucleic acids, fragments of EST related nucleic acids positional 
segments EST-related nucleic acids, or fragments of positional segments of EST-related nucleic acids may be more 
than 500 nucleotides long. 

[0175] For example, quantitative analysis of gene expression may be performed with EST-related nucleic acids 
fragments of EST related nucleic acids, positional segments EST-related nucleic acids, or fragments of positional 
segments of EST-related nucleic acids in a .complementary DNA microarray as described by Schena et 
al. (Sctence 270:467-470, 1995; Proc. Natl. Acad. Sci. U.S.A. 93:10614-10619, 1996). EST-related nucleic acids 
fragments of EST related nucleic acids, positional segments EST-related nucleic acids, or fragments of positional 
segments of EST-related nucleic acids are. amplified by PCR and arrayed from 96-welI microtiter plates onto silylated 
microscope slides using high-speed robotics. Printed arrays are incubated in a humid chamber to allow rehydration of 
the arr^y elements and rinsed, once in 0.2% SDS for 1 min, twice in water for 1 min and once for 5 min in sodium 
oorohydnde solution. The arrays are submerged in water for 2 min at 95°C, transferred into 0 2% SDS for 1 min 
rinsed twice with water, air dried and stored in the dark at 25*C. 

[0176] Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a single round of 

r ! V fn!^ tr ? nSCription - Probes are h y bridi2ed t0 1 c m 2 a microarrays under a 14 x 14 mm glass coverslip for 6-12 hours 
at 60 C. Arrays are washed for 5 min at 25°C in low stringency wash buffer (1 x SSC/0.2% SDS) then for 10 min at 
room temperature in high stringency wash buffer (0.1 x SSC/0.2% SDS), Arrays are scanned in 0 1 x SSC using a 
fluorescence laser scanning device fitted with a custom filter set. Accurate differential expression measurements 
are obtained by taking the average of the ratios of two independent hybridizations. 

[0177] Quantitative analysis of the expression of genes may also be performed with EST-related nucleic acids 
figments of EST reiated nucieic acids, positional segments EST-related nucleic acids, or fragments of positional 
segments of EST-related nucleic acids in complementary DNA arrays as described by Pietu ef a/ {Genome 
Research 6.492-503, 1996). The EST-related nucleic acids, fragments of EST related nucleic acids positional 
segments EST-related nucleic acids, or fragments of positional segments of EST-related nucleic acids thereof are PCR 
amplified and spotted on membranes. Then, mRNAs originating from various tissues or cells are labeled with 
radioactive nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are detected 
by phospho-imagmg or autoradiography. Duplicate experiments are performed and a quantitative analysis of 
differentially expressed mRNAs is then performed. 

[0178] Alternatively, expression analysis of the EST-related nucleic acids, fragments of EST related nucleic 
acids, positional segments EST-related nucleic acids, or fragments of positional segments of EST-related nucleic 
acids can be done through high density nucleotide arrays as described by Lockhart er al. (Nature BiotechnoloqyU- 
1675-1680, 1996) and Sosnowsky ef al. (Proc. Natl. Acad. Sci. 94:1119-1123, 1997). Oligonucleotides of 15-50 
nucleotides corresponding to sequences of EST-related nucleic acids, fragments of EST related nucleic acids 
positional segments EST-related nucleic acids, or fragments of positional segments of EST-related nucleic acids are 
synthesized directly on the chip (Lockhart et al.. supra) or synthesized and then addressed to the chip (Sosnowsky er 
al.. supra). Preferably, the oligonucleotides are about 20 nucleotides in length. 

[0179] cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or fluorescent dye are 
synthesized from the appropriate mRNA population and then randomly fragmented to an average size of 50 to' 100 
nucleotides. The said probes are then hybridized to the chip. After washing as described in Lockhart et al supra and 
application of different electric fields (Sonowsky et al, supra.), the dyes or labeling compounds are detected and ■ 
quantified Duplicate hybndizations are performed. Comparative analysis of the intensity of- the signal originating 
from cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential expression of 
the mRNA corresponding to the 5' EST, consensus contigated 5' EST or extended cDNA from which the oligonucleotide 
sequence has been designed. 

III. Use of 5* ESTs to Clone Extended cDNAs and to Clone the Corresponding Genomic DNAs 

[0180] Once 5' ESTs or consensus contigated 5' ESTs which include the 5* end of the corresponding mRNAs have 
been selected using the procedures described above, they can be utilized to isolate extended cDNAs which contain 
sequences adjacent to the 5' ESTs or contigated consensus 5' ESTs. The extended cDNAs may include the entire 
coding sequence of the protein encoded by the corresponding mRNA, including the authentic translation start site 
If the extended cDNA encodes a secreted protein, it may contain the signal sequence, and the sequence encoding the 
mature protein remaining after cleavage of the signal peptide. Extended cDNAs which include the entire coding 
sequence of the protein encoded by the corresponding mRNA are referred to herein as "full-length cDNAs" 
Alternatively, the extended cDNAs may not include the entire coding sequence of the protein encoded by the 
corresponding mRNA, although they do include sequences adjacent to the 5'ESTs or contigated consensus 5' ESTs In 
some embodiments in which the extended cDNAs are derived from an mRNA encoding a secreted protein the 
extended cDNAs may include only the sequence encoding the mature protein remaining after cleavage of the siqnal 
peptide, or only the sequence encoding the signal peptide. 

[0181] Example 17 below describes a general method for obtaining extended cDNAs using 5' ESTs or consensus 
contigated 5' ESTs. Example 28 below describes the cloning and sequencing of several extended cDNAs including 
extended cDNAs which include the entire coding sequence and authentic 5' end of the corresponding mRNA for several 
secreted proteins. 

[0182] The methods of Examples 17 and 18 can also be used to obtain extended cDNAs which encode less than 
the entire coding sequence of proteins encoded by the genes corresponding to the 5* ESTs or consensus contigated 
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^VSoTs^^ 15 20 25 30 

EXAMPLE 17 



Sg^^ cONAs whjon ,nrl„de the F n ti , Pndinn P„,nn 

extended cDNAs for any !L!^l^^% 0 ^ n ^Z Th,S t method , m ^ be ^"'ed to obtain 

consensus contgated 5' Lis encoding s^S^^^.rZSi^l " ^ «* 
1. Obtainin g Extended tDNIAq 



a) First strand synthesis 



[ l\i 4 L., The m ! thod t ! kes ^vantage of the known 5' sequence of the mRNA. A — - u 

b) Second strand synthesis 

//bioinformatics.wei 2 mann.a C JI/s 0 ftware/PC-RareLc/ m anuel htmT ' 5U 35 PC - Raf6 (http '' 

2, Sequencing of F ull L en gth E xtended rHMA S or Franm^nts w 

Eosp «£K ?» jc^st^ L ne s ;ctn p d ri 5 mers comp r e f : pcr use using 

translation initiation codon thus yieldinc la mta PPR '™,h ?, ? * hesec ° nd 5 P r,mer 15 ,ocate <l upstream of the 
length extended cDNA may b use I E Pa d"eS clonfno ^.1 m '"^ ,he ent ' re C ° ding Sw > uence - Such a ™ 
located downstream of the" tifi„^|£E £5? KSj "y' e,S7pC R Zd^T? 6 ^'"V* * * 
SUCh inCOm P' ete PCR « are submitted to a n5S iZlrl desSibK ISTfiZ* ° f ^ 

a) Nested PCR products containing complete ORFs 

6j /Vested PCR products containing incomplete ORFs 
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Sequencing extended cDNAs 

3, Cloning of Full Leng t h Extended rDMAe 

t£tL,H J^S* pro ^ uct 1 con ' ainin 9 the ™ coding sequence is then cloned in an appropriate vector For example 
^ o e ^ endedc DNAs can be cloned into any expression vector known in the art « vector, t-or example. 

Slef directiortheorpnLr^r 0bta / n ! d 33 deSCribed above are blunt ended molecu '<* that can be cloned in 

4 . Selectaon of dp n «t full |e n nth sequences obtained from th» f.qt 8 of thH pr^nt in„» nt| „ n 

[0199] Then for each remaining full length extended cDNA containing several ORFs a preselection of ORFs m» 
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conceivable cDNAs that will SSXx^^t^^^'T 1 ^ 9 Can C ' ea ' e and iden ^ «V «* the many 
allelic variants or other homologou VnudTaSa^S, ta id^S 9 ! ^ ^ ^^ 9 ? ne,fc * code ■ For example ' 
acids encoding the desired amino Lid .«M^b£X^k" deSC " bed be,0W - nucleic 

Uncesfc^S^ - -own codon or ,odon pair 

corZ>nding7^ WhiCh '" dUde the of the 

based methods may also be emp "ye ds m»!3S h c ° rres P ond| n9 "RNA, traditional hybridization 
the mRNAs from which the 5' ESK W ^^°£ n ^^ s %^£ ° bta,n th * 9 e "° miC DNAs which encode 
' 5 extended cDNAs, or nucleic acids which are EST! ». w I of,? denVed ' mRNAs corresponding to the 
ESTs. Example 18 below p^^S^JZltZ^ ^ 5 ESTS> ° r C ° ntigated COnsensus 5 ' 

SrL PP ^tSlkoZ SSS STf^ T* ° f 3 Si9nal P< * tide in the first 50 amino-acids or 
Heiine (Nuc. AciJs ™ 1 /XT,' am,n ?. ac ! d8 ° r less j " the ORF, using the matrix method of von 

• ----- ... iwww-rwOuii.-nr-iiiiixfiriirio m iti *x •» _ ._. : i_ _ i • ,._ 



45 



50 



Heiine (Nuc. Acid, R«* 14- A *Si aVo* ™~ ^ ™'™ ac, . as or ,ess ,n the ORF, using 
— _rwcrw v ,oou ^ «"u me modification described in Example 12. 

d) Homology to either nucleotide or protein sequences 



» Sate a'dlsTs™ t0 k fr" "2* S- — «* - the 

S amino^dcU^ in f *£™ ?>™ compared to 

patented protein sequences) These analyses Sn^o™2.?- ^^ T n R -? ,d Gen Ptept (Derwenfs database of 
maximum of 10 matches Sequenc^ S^l^^ Wl * the .P ara ™ter W=8 and allowing a 

35 sequences are recogneed as alreadyldentified Sinf h ° W ' n9 h ° mol °^ to known ^ 

cDNAs are c^^^^ * the top strand of fulMength extended 

Sequences of full-length extended cDNA ^^TtoTf& H^"* Us,n 9 BLASTX wittl the parameter E=0.001. 
already identified proteins. ° 70 /o homoloa y °™ 30 amino acid stretches are detected as 

40 5 Section of cloned full-length sciences obtains fm m «■ rc x s nf , hp nrpgont ' 

Computer SS^SSS f « S^JSLS: a ' rea ? b f en CharaCtefiZed by the ^-mentioned 
containing sequences of interest aromatic procedure ,n order to preselect full-length extended cDNAs 



a) Automatic sequence preselection 
[0214] 



55 [0215] 



negal sele^^raSn ^TjlTjZ^^ *" °" ** endS «* ™ M ^ a 

PCR artifacts as follows. Sequence* im^^S^S^!^^ muKng fr ° m ei,her contaminants or 
sequences are discarded as well a those S^m?^? Ve ° t0r DNA; tRNA - mtRNA ' rRN * 

defined in section 4 a) Sequence! obK ?Srt 5«„? q * eMn9 eXtensive homolo 9y to repeats as 
case a) but lacking polyA ta« are diseased O^SfT^LT 0 P "™ rS °" 5 ' and 3 ' ta 9 s 1. 

polyAtail (case a) or befor th Ind 0 , Z ^ cloned 3°5tr S > ilZ* uT^P 8 ** endin9 either before the 
proteins such as mature proteins which size is lei th^Jn 20 ^ini , h P '' , T "en ORFs containing unlikely mature 
size are eliminated 55 than 20 ammo aclds or less than 25% of the immature protein 



performed JsK ^^^ B T«F^. M .? , 7 nfl ° RFS ' 3 PreSe,eCti ° n ° f ° RFs is 

similar, the chosen ORF is the orw Twhich sbnai oSp hi ml h h 9 . 3 ' Peptlde ,S P referre d- If the ORF sizes are 
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55 



MTSK ^^^XEBX^' 30 nUCl60WeS " * - sa me 

or from alternative splicing, identical sequences or sequenc^ JSFtTf*?"" 'u^" 9 from intemal *™"9 
serves as a basis for manual selection of the sequences 8eWal frameshlfts - This automatic analyste 

Manual sequence selection 

class as follows. ORF sequences encoded by clones beSa to thf«l ° l0nes belon 9 in 9 t0 tne «me 

homology between nucleotide sequences of cton« belonSna ta^hp «L , 6 daSS are ali9ned and compared. If the 
stretches or if the homology between amino add1L™«r« 0 «? T? daSS 18 more ,han 90% °ver 30 nucleotide 
over 20 amino acid stretches, thar! Ve Tone are consiS atb^nnt 0 ^" 9 , K?' Same c,ass is ™» 
exhibrtmg matches with known amino acid sequence? o the b^t den * Ca | The chosen 0RF is ^ the one 
automate sequence preselection section. If the nuSde a Jd Z JT^ 9 '° the criteria mentioned m the 
re =f , the clones are said to encode ^J^^^t ITT^^^ 

Eg J^sJ^^ sequences of interest is performed using the 

^rnoiogies with known nucleic acids and ^^^^^ «9™0 «• first checked. Then 

an mRNA ceding for an already known protSn the sequence ?s keDt Ixamn^ T^S alternative s ^ ° f 
cONAs containing sequences of interest are described X SmoKi S"* 8 ° f SU( * cloned ™-'ength extended 
nserts or located on chromosome breaking points afassesled b? hoL^^f "?u S reSultin9 frofn chimera ° r *»"ble 
this procedure. a n ms as assessed by homology to other sequences are discarded during 

wSh includfdesSd PoTn?o7?h?e^ engineered to obtain nucleic acids 

v,tro oligonucleotide synthesis. For example nucteS adds S '^'j^^f such as subcloning, PGR, or in 
sequences encoding the signal peptide and *e mS.™ ^ ' Ude ° nly the fu " codin 9 sequences ( e the 

Protein may be obtained. For examp^ 2 nucfek ^add ma !T ° f the C ° ding sequences for fe encoded 

g 100, 150^200, 300, 400 or 500 l&Sfi^^^*** 15 - 1 «- ». 25, 28, 30, 35, 40. 50. 

-° d K?^^ to determine the amino acid sequence it 
conceivable cDNAs that will encode that p^n by 3m^u2nT ^ md iden,ify an V of the ™"y 
allelic variants or other homologous nucleic acids «V S.ntT rf * e flenefc e 0 *- For axampla 

acid*, encoding the desired amino Lid -^Sh^S.^^^" ^ « 

preferences "J^SS^ "*» *• — codon or codon pair 

^P 0 "^*^^^ C ° NAs Whi f ^ the authentic 5'end of the 

hybridization based methods may to ^S£^^J^™ , of ! he c °^Ponding mRNA, traditional 
which encode the m RIM As from which J»?WT.nr, y a ' S ° be used to obtain the genomic ONAs 

corresponding to the extended cDNA of nucleic Sd^ 'ySS^^I^ 1 5 ' ESTS were derived . mSJK 
consensus contigated 5' EST* w£r^^ eXtended c ™As. 5' ESTs, or 

EXAMPLE 18 

~ M txTenae d_ cDNAs, 5 ESTs nr C on s ensus Contig ateH * FST ~ 

may comprise at least 10, 15, 18, 20 25 28 30 3 40 50 7. inn «£ h 2? b,e probe The detectab '' probe 
EST*"?' ? ? ""^^W^KT iMj^ 15 °' 200 ' 30 °. 4 °° 500 consecutive 
11 TeChn,QUeS ^ CDNA *« * a < D ™ library w^ch hybridize to a given probe sequence are 
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?9 S 89° T^ m ?i f r k Bt aL Mol ° cu,ar 9! onin ? : A Laboratory Manuel 2d Ed.. Cold Spring Harbor Laboratory Press 
1889. The same techniques may be used to isolate genomic DNAs. 

[0227] Briefly, cDNA or genomic DNA clones which hybridize to the detectable probe are identified and isolated 

£JctS.^| iPUla , t0n 35 f °! l0WS - The detectab,e probe described in the Preceding paragraph TlaMed^S a 
fnd inn rti nh ^ ? radl0 ® 0t °P e or a «»rescent molecule. Techniques for labeling the probe are we! (Town 
^^T^JX?™ ^ P °^T Me ta k nase ' nick Nation, transcription, and non TadioacbVe 

KZT'aIS k, , 5 or f genomic 9,. . in the "brary are transferred to a nitrocellulose or nylon filter and 
iZ ^Ln^ n l0Ck ^l ° f n ?l SpeC,fic srtes ' ,he filter is incubated with the labeled probe for an amount of 
thereto W 9 ° f Pr ° b6 t0 C ° NAS ° r gen ° mic DNAs containing a sec ' uence capable of hybridizing 

El to ?h P HT,? a N e $t T enC n y Mi thS h y bridfeation condition3 used ««» identify cDNAs or genomic DNAs which 

" 9en ° miC S haVin9 d,fferent levels of ho - log y to the probe can be 

1. Identification of cDNA nr r^nomic DNA Sequences Havinn , Hiph DegreR nf Hnmni pgy to the I ahP.Prt Pmhfl 

E?„» t T ° f !K entify , c DNAs ° r genomic DNAs havin 9 a hiSh degree of homology to the probe sequence the melting 
temperature of the probe may be calculated using the following formulas: 9 

[0230] For probes between 14 and 70 nucleotides in length the melting temperature (Tm) is calculated usino the 
formula: Tm=8T5 + 16.6(log [Na + ])-0.41(fraction G + C)-(600/N) where N is the length of the probe 9 
'^r;' " ^hybridization is carried out in a solution containing formamide, the melting temperature may. be 
S onhe'proba eqUa *° n Tm=8l5+16 - 6 ( |oa [Na*], + 0.41 (fraction G + C)-(0 63% formamide)-(600/N) where Sthe 
[0232] Prehybridization may be carried out in 6X SSC, 5X Denhardfs reagent, 0.5% SDS 100 uq denatured 
« 'Inv 0 ? S 0 ern \ DN L A or 6X SSC - 5 * Denhardfs reagent, 0.5% SDS, 100 ug denatured fragmented sa mon 
fn,™ u w J° rma e ' The formulas ,or SSC and Denhardrs s °'"tiO"s are listed in Sambrookiffl supra 
IS thp n H r ybridl2atlon is conducted by adding the detectable probe to the prehybridization solutions listed' above 
Where the probe compnses double stranded DNA, it is denatured before addition to the hybridization so ution The 
fitter is contacted w.th the hybridation solution for a sufficient period of time to alfow the o robe to hvbriSe 

l?2 e 0 n 0 d lSd?^fSh Tl^TT SeqUen K C6S com P |emen tary thereto or homoCus CfoIS 

over ^uu nucleotides in length, the hybridization may be carried out at 15-25X below the Tm For shorter nrohps 

h b dStio^^ IT te C ° ndUCted at 15 -25'C be«ow lh m e Tm. iSSfJS* S 

hybridizations in 6X SSC, the hybridization is conducted at approximately 68°C. Preferably for hybridizations in 
50% formamide containing solutions, the hybridization is conducted at approximately 42 °C nyoridizations in 

mllil M ° f the foregoin 9 hybridizations would be considered to be under "stringent" conditions 
[0235] Following hybridization, the filter is washed in 2X SSC, 0. 1 % SDS at room temperature for 1 5 minutes The 
fitter ,s then washed with 0.1X SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour ThereafterT solution 
leSrature h y bridl2at '° n temperature in 0.1X SSC, 0.5% SDS. A final wash is conducted in 0 IX SSC at room 

coionaSh A nVuef n ° miC ^ ^ ***** * * e Pf0be 3fe identified * ^radiography or other 

2 Obtaining cDNA or Genomic DNA Sequences Having Lower Pu rees of Homology to the I ahgleri Prnh» 

f!fmIL„ JJf ab °r procedure ™V be modified to identify cDNAs or genomic DNAs having decreasing levels of 
2 h 9 r° t e P ro be sequence. For example, to obtain cDNAs or genomic DNAs of decreasing homology to the 

2 nc ment ?oT"tc iS'ffcTS'r" F °l the hybfidi2a * on temperature'may be decreased 

in increments or 5 c from 68 C to 42 C in a hybridization buffer having a sodium concentration of aDDroximatPiv 1M 

conditions are considered to be "moderate" conditions above 50*C and "low" conditions below 5 0*C 

Srature^cTn 2 mz ^ s ti carti ^ ° ut in buffere . such as 6X SSC. containing formamide at a 

in^fl^ ( JL . ^ ase ' the concent ration of formamide in the hybridization buffer may be reduced in 5% 

~f„' r ™ t0 ™ to identify clones having decreasing levels of homology to the probe following 
hybrideatjon. the filter may be washed with 6X SSC, 0.5% SDS at 50*C. These conditions are considered to be 

moderate" condrtions above 25% formamide and W conditions below 25% formamide considered to be 

[0239] cDNAs or genomic DNAs which have hybridized to the probe are identified by autoradiography. 

C "jSlffi" ,° f ° eqr ! 6 " f ?" m0 ' 0qY betWee " thft ° bt3ined r0NIAs " r Gen0mir - nNAR a "d 5'ESTs Cnn^n.,.. 

Contiqated 5 FSTs , 0f Fxtended cDNA s or Between the Polvnpp tirfpc Encoded h y t h e Obtained ^DKIA s n r" 
DNAs and the Polypeptides Encoded bv the 5'FST. C.nnJZZ r ^ ed 5 - E sf " ^S "" m| r 

[0240] To determine the level of homology between the hybridized cDNA or genomic DNA and the 5'EST 
consensus contgated 5'EST or extended cDNA from which the probe was derived the nuc leobde Sequences of fhe 

derived are compared. The sequences of the 5'EST, consensus contigated 5'EST or extended cDNA from whichThe 
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ZoH^f de " Ved f d the , s e^ences of the cDNA or genomic DNA which hybridized to the detectable probe may be 
s ored on a computer readable medium as described below and compared to one another using any ofV variety of 
algorithms familiar to those skilled in the art, those described below. ^ 

S51 1J a J° d , eterm j" e the level of homology between the polypeptide encoded by the hybridizing cDNA or genomic 
DNA and the polypeptide encoded by the 5'EST, consensus contigated 5'EST or extended cDNA from which the probe 
E^??St yPSPt,de * equ e" ce , eroded by the hybridized nucleic acid and the polypeptide sequence encoded 
by the 5 EST consensus contigated 5'EST or extended cDNA from which the probe was derived are compared The 
sequences of the polypeptide encoded by the 5'EST, consensus contigated 5'EST or extended cDNA from which the 
k th 1 P^P** sequence encoded by the cDNA or genomic DNA which hybridzed S he 

detectable probe may be stored on a computer readable medium as described below and compared to one another 
using any of a variety of algorithms familiar to those skilled in the art, those described below 

[0242] Protein and/or nucleic acid sequence homologies may be evaluated using any of the variety of sequence 
comparison algorisms and programs known in the art. Such algorithms and programs include but are bv no means 
SSS^Vf ! A d F n A f£ and CLUSTALW ( p **™ and Lipman, 1988. Zc.Nal AcadScl 
5^M«T^«n 2 ^ 48 A)tS ^ hu , 1 Bt al - 1990 ' / MoL Bioi 2 ' 5 ^:403-410; Thompson et al., 1994 Nucleic Acids Res 

Jfr?i1f 7 ^ 0: , H SS5 8 .?', a '-- 1 " 6 ' Meth0ds Emymo,: 266 :383-402; Altschulef ai. 1990. J. MM SW 215(3) Aol 
410; Altschul ef al., 1993, Nature Genetics 3:266-272). M 1 w 

[0243] in a particularly preferred embodiment, protein and nucleic acid sequence homologies are evaluated using 
^Jf' C L M^ A i 9nr ? en J Search 1001 (" BLA ST') which is well known in the art (see, e.g.. Karlin and Altschul 
1990.P ro c. Natl. Acad. So. USA 87:2267-2268; Altschul et ai. 1990 J. Mol 6/0/ 2/5-403-410 Altschul e/' 

' '/"^', t , ^uai,iiui ci a;., ia 3 /, ivuc Mc/os rces. ^0:3389-3402) In Darticular fivp 

specific BLAST programs are used to perform the following task" particular, rive 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual translation products of a query nucleotide sequence (both strands) 
against a protein sequence database; . 3UC " US ' 

(4) TBLASTN compares a query protein sequence against a nucleotide sequence database translated in all six 
reading frames (both strands); and 

tSL 1 ? 1 * 81 ? compa . r ! S the six - frame translations of a nucleotide query sequence against the six-frame 
translations of a nucleotide sequence database. 



I 1 .Ju Programs identify homologous sequences by identifying similar segments, which are referred to 

nfZLht i^"? 3 Se9ment > PairSl " b6tWeen 3 query amino or nucleic acid sequence and a . test sequence which is 
KSSf r //,?^- ed IT 3 Pr0te ' n ?' nUdeic acid sequence database - High-scoring segment pairs are preferaby 
maS usid i fi a S 6 R o y ? S5 S °\ a S ^ ring T 3 **' man * of which are known in *e art. Preferably, the scoring 
SprorlL%7!o R1 ?1 2 T 3 ^ ( ?u° nn D « * 1 " 2 ' SC '^ CS 256:1443-1445; Henikoff and Henikoff, 
nihnff !h 4t ? 1, i>?- 8 Pref f at "^ the PAM or PAM250 matrices ma y a,so be "sed (see, e.g.. Schwartz art 
SKLw 'k- f 8 'K,^ afr '?|. for D9tec *«' D/sfa " ce **°«onships: Atlas of Protein Sequence and 
Structure. Washington: National Biomedical Research Foundation) 

l°nH 4 lL J he BLAST P ro 9«ms evaluate the statistical significance of all high-scoring segment pairs identified 
nlrinf t ^° Se s D e 9 men .! S wnicn a user-specified threshold of significance, such as a user! 

n^nn fh P f ♦ 1 1 1 r,orno '°9y- Preferably, the statistical significance of a high-scoring segment pair is evaluated 
8^2267-22*1? S,gnlficance formula of Kartin ( s ^. e.g.. Karlin and Altschul. 1990 Proc. Natl. Acad . Sci USA 

[0246] The parameters used with the above algorithms may be adapted depending on the sequence length and 

SLlinTJ ^ StUd '! d ' some embodiments, the parameters may be the default parameters used by the 
algorithms in the absence of instructions from the user. 

J" SOme ernbodiments . the level of homology between the hybridized nucleic acid and the extended cDNA 

alShm^ 0 i h^ n i S n US B C « nti9a ! ed l E . ST fr °T Whi S h the probe was derived ma * be determined using the FASTDB 
descr bed ,n Brutlag e al. Comp. App. Biosci. 6:237-245. 1990. In such analyses the parameters may be 

flnh-n r IT Matr,x = Unrta ^ k-*"Pto»4, Mismatch Penalty=1, Joining Penalty=30, Randomization Group 
I 9 u°L C y Xcff ? C °L e=1 ' Gap Penalt y= 5 . Gap Size Penalty=0.05, Window Size=500 or the length of the seauence 

UuntS'^ t0 ( ,h ?r b ,f" ^'^T' iS Sh0rter BeCaUSe the FASTDB P r °9 fam do^'S consLTer 5' or 3' 
runcabons when calculating homology levels, if the sequence which hybridizes to the probe is truncated relative to 

hom S l q n U v en ,»! B 0fthe e * e " ded cDNA ; 5 ' EST . <" consensus contigated 5'EST from which the prob ^ was derived the 
homology level is manually adjusted by calculating the number of nucleotides of the extended cDNA 5'EST or 

™n? US CO f n i9a ed 5 ', EST WhiCh are not ma,ched or ali 9 ned with ^e hybridizing sequence determining 'the 
n f eo ™ e * of the sequence which the non-matched or non-aligned nucLotides 

represent, and subtracting this percentage from the homology level. For example, if the hybridizing sequence is 700 
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.l^il f ?K 9U ] ! i nd ,^ e 1 eXtended cDNA ' 5 ' EST - or consensus contigated 5' EST sequence is 1000 nucleotides in 
length wherein the first 300 bases at the 5' end of the extended cDNA, 5'EST, or consensus "rtaSed ? KT a e 
JS h^HiS T"fl? 9 S6 ?l! enCe ' 3nd Wherei " the overia PP in 9 700 nucleotides are identic* toll homology level 
?nNA b cVc4 Sted 33 fC " 0WS ' The non - matehe d, non-aligned 300 bases represent 30% of the length of the extended 
cDNA 5 EST or consensus contigated 5' EST. If the overlapping 700 nucleotides are 100% identica I the adjusted 
homology level would be 100-30=70% homology. It should be noted that the preceding adjus mente Ve on t 

ESrhli JT™ 0 h6d ° f non - al, 9 ned nucleotides are at the 5' or 3' ends. No adjustments are made if the non 
matched or non-aligned sequences are internal or under any other conditions. 

£f./ 8] , • F ° r ! X ! mple ' US ' ng the above methods . nucleic acids having at least 95% nucleic acid homology at least 
nnlr 01 ^ f ,d t ]° m0l09y ' 3t ' eaSt 97% nUCleic acid homol °9y. * ^ 98% nucleic acid homology at leal 99% 
rnn Tn Jl ft Z ^ e than " % nucleic acid nornolo 9y to tne tended cDNA, 5'EST or consensus 

contigated 5 EST from wh.ch the probe was derived may be obtained and identified. Such nucleic acids mav be aTeHc 
variants or related nucleic acids from other species. Similarly, by using progressive? les ^ stringent hyb^aion 
conditions one can obtain and identify nucleic acids having at least 90%, at leait 85%, at least 80% o at leasf 75% 
homology to the extended cDNA, 5'EST, or consensus contigated 5' EST from which the probe was derived 
Lnnfh '„h h 8 f „" ? me ' h0dS and a| 9 0fitnms such as FASTA with parameters depending on the sequence 
tXr^ f 9re l h0m ° l09y S,Ud ' ed ' f ° r example the defauft P a ^meters used by the algorithms in the absence of 
XT^J^tT ^"T, nUC ' eiC addS enCOdiP9 Pr0teins havi "9 at lea *t 99% at least 98% a 
t 'J 1 L e ? £f£ ? Ieast 90% ' at least 85% ' at least 80% or at least 75% homology to the 

feSc^ 

KLn XT^' a 8 le Q Sl ° f P ^ Pep H d ! h0m0l09y may be determined using the FASTDB algorithm described 

M^mo ^^L^p^l; 5 ',^ 0 - such analyses the parameters ma y be se ' e *ed a * follows.- 

K 0 w97«-^ P i ' 1 Mlsmatc h Penalty=1, Joining Penalty=20, Randomization Group Length=0, Cutoff Score=1 
Window Size-Sequence Length, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the 
homologous sequence whichever is shorter. If the homologous amino acid sequence is shorter than the arlno acid 
sequence encoded by the extended cDNA, 5'EST, or consensus contigated 5' EST as a resurt of an N te min^anoVor C 
terminal deletion the results may be manually corrected as follows. First, the nm^Z^^S^STt^ 

Honed wlth q th e e n h e o rr ded ^ 6Xtend ! d CDNA ' 5 ' EST ' ° f COnSensUS con «9ated "£n^^Z££& 
by tne S£?eJ ^cDNA 5'IIf Z^' 5 determined , T c he [ n . »• Percentage of the length of the sequence encoded 
rl^Lnffc , . ^Vu ' ° r c ° nsensus contigated 5' EST which the non-matched or non-aligned amino acids 
represent is calculated. This percentage .s subtracted from the homology level. For example wherein the Tmino acid 

lenoth of e the n hn d m e l b „ y SX,ended C ° 0 N n A ' 5 ' EST ' ° f COnS8nSUS Conti 9 ated 5 ' EST is 100 ™<™ acTd InTength and the 
S ^ ^he homologous sequence is 80 amino acids and wherein the amino acid sequence encoded by the extended 

SIph a ff „' S tm ? C ^ d at the N t6rminal end With respect to the homologous sequence, the homology level is 
It I h k J° T L" thS P reced,n 9 scenari ° th e r e are 20 non-matched, non-aligned amino acids in the lequence 
encoded by the extended cDNA. 5'EST, or consensus contigated 5' EST This represents 20% of the length of the 
ammo acid sequence encoded by the extended cDNA, 5'EST, or consensus contigated 1 1" EST the remaTnina am So 
Stl 8 5 ,den H tlca ' between the two sequences, the homology level would'be ^^S^^SSm Z 
adjustments are made ,f the non-matched or non-aligned sequences are internal or under any other condition? 
IU251J in addition to the above described methods, other protocols are available to obtain extended cDNAs using 
5 ESTs or consensus contigated 5'ESTs as outlined in the following paragraphs 

Si Ext f nded cDN , As ma V be prepared by obtaining mRNA from the tissue, cell, or organism of interest using 
mRNA preparation procedures utilizing polyA selection procedures or other techniques known to those ski led in [he 

trl A rin«l P r e H capable ° f to the PolyA tail of the mRNA is hybridized to the mRNA an reverse 

transcription reaction is performed to generate a first cDNA strand. reverse 

[0253] The first cDNA strand is hybridized to a second primer containing at least 10 consecutive nucleotides of 
the sequences of SEQ ID NOs 24-4100 and 8178-36681. Preferably, the primer comprises at least 10 12 f 17 18 
A consecutive nucleotides from the sequences of SEQ ID NOs 24-4100 and 8178-36681 in some 
36681 ^frtieH P nm ?? orn P" se s more than 30 nucleotides from the sequences of SEQ ID NOs 24-4100 and 8178 
S« ♦ ,sdesired t0 obtain extended cDNAs containing the full protein coding sequence inching the 
S« n n C cfT n Th fc ° n ,nit,at,onsrte .. th e second primer used contains sequences located ups^am of thVtranslation 
SfanH a» l^, * ^US" 1 * <s extended to generate a second cDNA strand complements to the first cDNA 
obTainef " y ' T ' PCR 38 d6SCribed ab ° VS USin 9 primers from both ends^of the cDNA to be 

i. 2541 • . Extended cDNAs confining 5' fragments of the mRNA may be prepared by hybridizing an mRNA 
ZZZZ « %^y en ?T H° f SE ° 10 N ,° S: 24 - 41 °° and 8178-36681 with I primer comprising acSmplementaS to t 
2m« ^ If r e 'S nU * l J C , aCld hybridizing the primer to the mRNAs, and reverse transcribing , the hybridized 

23 25 or 28 cons?, 3 , fiv^ A f ^T^* mRNAS Preferab '* ,he primer COm P rises at '«* 10, 12 15. 17 18 20 
rn, KS ,' 28 T c u onsecu,lve nucleotides of the sequences complementary to SEQ ID NOs: 24-4100 and 8178-36681 

cDNA strand m l e ?p r m % S /r n h d h°» A Stfand Cornplemen,ar y t0 the flrst <=DNA strand is synthesized. The second 
cDNAsSn^ ^.nrilvtonHl th y "^'T" 9 3 pnmef corn P |ernen tary to sequences in the first cDNA strand to the first 
cuna strand and extending the primer to generate the second cDNA strand. 
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r^l^ Ascribed above are iso.ated and Coned. 

s S ^S^^^^S^ t0 mRN A lolenerate a Hrst cDNA 

5 stranded cDNA and cloning thedoifwestrana^^ thivs 6 1? C ° N * strand . the double 

Current Protocols in Molecular Biology John wXv -a Ions „ w c Sk 'u ed ,n the art and are described *» 

^oratory Manual. Second Edition, cS S„flWr U^p^ ^ b,Bok * Molecu,a ' Cl °^ A 

LZh. f rr 9 ^ cDNAs or extended cdnas - - 

w follows. The cDNA library inTe double b^SS^S^^^^S^ ■"*? d °* Ubl ! Stranded P ha 9 emids as 
endonuclease, such as the Gene II product o the , ohS^Srf a „ rendered single stranded by treatment with an 
A biotinylated oligonucleotide compK Yhe ^u?n? 5fti^T te8 C c ( T h, ! n ?! ,a/ -' Gene 12795 " 8 ' 1993 >- 
the single stranded phagemids PreferablV ' thTC^,?™? 8 - ^ ? , EST - felated nuc| e'"c acid is hybridized to 

sr five M u frf ofthe ^Waaw 15 - 17 - - 18 - 20 ' 23 ' 25 - « 28 

« w«h stpt^^ and phagemids are isolated by incubating the hybrids 

131, 1992). Thereafter, the Stag pSagem toaTZ^^^™'!!* ^ *' Biotechni ^ «: 124- 
using a primer specific for the S-E^^^^S^L^^ ^ C ° nVerted into double stranded ^NA 
oligonucleotide. Alternatively, protocols such as the Gen "S2?kl?^Sp??° e "f 1 t0 d6Si9n the biotinvla te d 
stranded DNA is transformed into bacteria. Erf «ndJ f^^^^^K™** ^ed. The resulting double 

20 r02Bm ated S6qUenCe are idenWied by ^"PCR w loWtWh&T l '' U " ia,n,n9 ^ 5 ' tST ° r C ° nSenSUS 
UhVotein^c^ '"■ 3 M * ° f extended c ^As containing full- 

subsequent evaluation ofle enc^C^ o!^^ ^ as ™* '<>r 

EXAMPLE 19 
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Full Length rpMAc 
[0261J 



30 [0262] 
was ol 
Heijne 
[0263] 
was ol 
of 8.2. 
[0264] 
using t 
of 10.7 
[0265] 



Kder^^ 

See o^TS^ ^ cONAs may be screened for the 

which are well conserved amongst the members* z ^Jl^t^rl ° f, igna K tures ' sma " amino acid sequences 

by a few full-length cDNAs derived from the P 0 '^'^ encoded 

Le-CLOjeS eSS a^uTprosTa^onV o {JS? '? N ° : 7 (internal deSi ^« on 78 " 8 -3- 

exhibits the characteristic PROSl?E signa u e £S 'pontons 9C ^^^^'^no'^ine-binding protein from which it 
nematodes to fly, yeast, rodent and primate soecTeT ShZ^l r , 1 0ton t r ° m th ' S wides P read farniry. from 

[02671 The Drotein of SPO in wn • 1 n « TTJ °' seases - and/or borders related to male fertility and sterility 
5-0-H9-FLC) sh P ows homologi s wSh a ^^"SS^ST ^ ^ '° N ° :9 (intema ' deS ' gnation 108 01 3 " 
rodents and human). In addition sSe membl of £ CiS SS?^ 1 ^^ am ° ng 6Uka,yo,es ^ east ' rabbit . 
(Portilla ef a/, / Am. Soc. Nephro. 9 1178 1186 ?1 9981 ) All rrfpm hire "'^independent phospholipase A2 activity 
GXSXG motif of carboxylesterases that ta also found in L irot^ of ,n GXhM the active site consensus 
this protein may be a membran^E 5h Se hnSSp ^ ° N ° '1° {P ° Sitio " 54 to 58) ' ln addition . 
(Claros and von Heijne, CABIOS apple Sto M0 685 686 ^qL« t'", as P red ' cted ^ software TopPred li 
P-n of SEQ ,0 N O:10 may p, ay frole in * "1/^^ 
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part therem, may be useful in diagnosing and/or treating several disorders including, but not limited to cancer 
Jon? and fl neur ° de 9 ene ' a «ve disorders such as Parkinson's and Alzheimer's diseases. It mayTo be usefuMn 
modulating inflammatory responses to infectious agents and/or to suppress graft rejection 

K?2n nJEf ° f SEQ t ID k N0: , 12 encoded ^ the length cDNA SEQ ID NO: 1 1 (internal designation 108- 
OO4-5-O-D10-FLC) shows remote homology to a subfamily of beta4-galactosyltransferases widely conserved in animals 

SuVoMn XTc ChiCke , n) Suc f h , en2 y mes . usua 'V type II membrane proteins located in the en^pasmic 
reticulum or m the Golgi apparatus, catalyzes the biosynthesis of glycoproteins, glycolipid glycans and lactose 
Their characteristic features defined as those of subfamily A in Breton ot al, J. Biochem.. 123- 1000-1 009 d 998 are 

SKtZ^S hVint ThVk SEQ ,!n D N ? : I 2 ' especially the re 9' on ' containin 9 the DVD motif' positions 
of SEQ n no 9 ^ h,c ,r°^ ed f'^'J" UD . P b ' ndlng or in the catalvtic P f0Cess ln add *°n. the protein 

Srmin?! Si J PJ„,£^h« * P ' ^ * ° f " pr ° teia lndeed ' il contains a short SS-aminoWlong N- 

terminal tail, a transmembrane segment from positions 29 to 49 and a large 278-amino-acid-long C-terminal tail as 
predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes 10 esKSe i (1094» Taken 
together these data suggest that the protein of SEQ ID NO: 12 may play a role in 7h f bosynLsis of 
Ss Sa this a S S e 'in a n m d au °U he C f r , b0hy H drate m ° ietieS ° f a'V^P^teins and grycolipids and/or in ShSFnfcSSo? 
Jl VnZ P l h 7 M J n dla 9 nosin 9 and/or beating several types of disorders including, but not limited 

rheumato^'arthn'^ Cafd,0vascu,ar disorders ' autoimmune disorders and rheumatic diseases including 

nnq 6 ?n a? l^rFT" of 5 EQID N0: 1 < encoded bv the full-length cDNA SEQ ID NO: 13 (internal designation 108- 
009-5-0-A2-FLC) shows extensive, homology to the bZIP family of transcription factors, and especially to the human 

luman protein (Lu ef al. Mol. Cell. Biol.. 17 fiii7.«;i7R ./i.oo7.v» tk o.).^. u.. .'..u ... . y lne numan 

of a hacir nivia hinw,n« — j t , " -. — »'•->"///• • ■.«. motwi nio.uuc me wnoie u^ir aomain composed 

In ft.^^n i ^S?n^.n^/ I 3 leu K cine , Zip e er a " 0Wng protein dimeri "*°n. The basic domain is conserved 
in the protein of SEQ ID NO. 14 as shown by the characteristic PROSITE signature (positions 224-237) except for a 
conservative substitution of a glutamic acid with an aspartic acid in position 233. The typical PROS ITE stanaSre 

^SS^^^^S 9 "^^ t0 280) - Take " ,09 " ,her ' th6Se data S ^ est ^fthe'pS of 
uS.nn JZrlil J , DN , A> h6nCe ' e 9 ulatm 9 9 ene expression as a transcription factor. Thus, this protein may be 
useful in diagnosing and/or treating several types of disorders including, but not limited to cancer 

E- «, Bacten f' cl0 , n es containing plasmids containing the full length cDNAs described above are presently 
stored in the inventors laboratories under the internal identification numbers provided above The inserts mav be 

meXm T EUSiJ^S" ^"t b \TT 9 - ™ aliqUOt 0f the appropriate bacterial he JpTopriate 

S « I,w ali nf T NA ° an * hen , be ' SOlated USlns Plasmid iso,ation Procedures familiar to those skilled in the art 
such as a^kame lysis minipreps or large scale alkaline lysis plasmid isolation procedures If desired the plasmid 
ZiZl ,h, rtl 7 ennc h hed centrifugation on a cesium chloride gradient, size exclusion chromatography o "'ton 
exchange chromatography. The plasm,d DNA obtained using these procedures may then be manipulated usina 
standard coning techniques familiar to those skilled in the art. Alternatively, a PCR can be done S primers 

2?nT^n a ^ th , endS f EST r erti0n - ThS PCR pr0dUct Which corresponds to the 5'EST can then temtipuSted 
using standard cloning techniques familiar to those skilled in the art >«humwu 



IV. Expression of Proteins 



a ,,H J f f nuc le'C acids fragments of EST-related nucleic acids, positional segments of EST-related 
nucleic aads, and fragments of positional segments of EST-related nucleic acids may be used to express the 
^tSSSX 1 ?H y enC °t PartCU,ar ' th6y may be used t0 express EST-related polypeptide ° fragments of 
IS"£22 1^ eS ' , P0S,t0nal S l 9n i entS ° f EST - related Polypeptides, or fragments of positiona segment! o 
nuJj aril P i ? * S °f me emb0dlments ' the EST-related nucleic acids, positional segments of EST-related 
nucleic acids and fragments of positional segments of EST-related nucleic acids may be used to express the fuN 

S*S£Z Spn^H 9 *' P ? tide ^Vt* matUre po| VP e P tide ) °f a ^creted protein* the mature pSn (i e the 
S* 9 enerat ' f af tf r cleava 9e of the signal peptide), or the signal peptide of a secreted protein If desired 
? t h Tt COdln! ? th , e h s| 9 nal P^« d e may be used to facilitate secretion of the expressed protein. It 5 be 
SSSf of?^-^.?^. 0 ES I- related nucleic acids ' fra 9ments of EST-related nucleic acids, positional 
3t a nlnli„ h h 1 nUC ' e ' C aC ' dS ' 0r fragmentS ° f positional se 9 ments of EST-related nucleic acids may be 
aTdSe^beSj! eXPreSS '° n * l ° aSate ™ ^ f ° r analysis of the encoded P rote ins 

EXAMPLE 20 

Expression of the Proteins FncodPd by the Genes Corresponding tn th e S ESTs or Ce nsus ContinatPH s-. frt« 

[0272] To express their encoded proteins the EST-related nucleic acids, fragments of EST-related nucleic acids 
positional segments of EST-related nucleic acids, or fragments of positional segments of EST-related nucleic acids 

Z^ fisr^Z e e T eS t n VeCt °'r ' n ^ inS,anceS ' nucleic a ^s encoding EST-relaled po pep'de? 
fragments of EST-related polypeptides, positional segments of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides may be cloned into a suitable expression vector rra 9 ments of Positional 

ISn,* J" ^ embodi , me . n ^ t the nude to acids inserted into the expression vector may comprise the coding 
^ q X l° sec ' uenceselec,ed ^m the group consisting of 24-4100. In other embodiments, the nucleic acids 
inserted into the expression vector may comprise may comprise the full coding sequence (i.e the nucleotides 
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encoding the signal peptide and the mature polypeptide) of oneof SEQ ID NO*- 1791 ™i 1 i 
nucle.c acid inserted into the expression vector M^Ztan>,^*L 1 3721-3811. In some embodiments, the 
3721-3811 which encode the mature pofypeS ?i e Te nlS^ l JT ° J he sec > uences °<SEQ ID NOs: 
cleavage of the signal peptide) In furtherembodfmint, fh/n^li! / • COd, ^ 9 J the P°^P Ms generated after 
comprise the nuclides of 24-652 an 372 l-3sTl wh^cf .n^th 0 '* ,n f rted ' nto the ex P ressi °n sector may 
expressed protein. The nuclei ■ add insertec I into , the e 22=in^? t * S ' 9nal pept,de to facilitate secretio " of *• 
sequences encoding the ^^^^^JSS^Z^i upstream of the 

confer tissue specific expression. sequences wh.ch regulate express.on levels or sequences which 

SlesJs^^^ encode a polypeptide comprising the one of the 

encode the full polypeptide sequence^ I Z Tsianaf S 3C ' d ") Serted int ° the ex P ression vector ™V 
NOs: 7798-7888 In other emboZH xie Sc ZSt^^T™ P°*P?P Ude > included ^ one of SEQ ID 
polypeptide (i.e. the polypeptide generated aft! *SaSe "f 55 T£* 6 ^ e f .'° n , v . ector ma * encode the m ^ure 
of SEQ ID NOs; 798-7888 In furtfier embod^tnts th nnnl w 9 " 3 ' W e) lncluded in one of the sequences 
the signa, peptide included in'oneSL^ the -tor may encode 

-d facilitate prop^ 

^ c nr N r o e 5 xp o r s. or9anism in which the expression vector is ^^ced^^^'or^^: 0 ^ 

methionine initiation codon an" polyA S f« t ru^t t,H "^h" or P°' yP ? ptide to be «P™"ed includes a 
methionine to serve as the inflation , site an niSnn ill ^ thS P ol ^ e P tide to be expressed lacks a 
the nucleic acid using ^^xS^J^^TS^ be ' ntr ° dUced next t0 ,he fifSt codon ° f 
to be expressed lacks a polyA £nal 2 Sequence ca be aJdedTo ^ ™ ' .""f^ 8 , ,hS Pf0tein or P°^m> 
polyA signal from pSG5 (Stratagene) usnc i Bdfand ^ii r Lt>t !?* C0 P struct for exa ™Ple. splicing out the 
the mammalian expression S^pXTi TSiS^e) dXT ^S/**"?"™ 888 ^ meS 3nd inc0f P° ratin 9 * "to 
Moloney Murine Leukemia Virus. The Us£ ^ oMhe KrI in th^ Jn^ r S ,, and 3 ,r P0,t0n °' the9ag gene from 
vector includes the Herpes Simplex "ImSVLt ^Itlr ™h ?h f \1? W effi ° lent Stable t™ 5 ^*™- The 
encoding the polypeptide to be expressed b otSn^d % ppp r ^ 8 e, ? Ctable neomvcin 9ene. The nucleic acid 
complementary to the nudeic acid encodno S2 ^StS, <S f't" 31 Ve ° t0r USing oli 9°nucleotide primers 

endonuclease sequences fo ^i^^t^S^SS^^ be * x P ressed and containing restriction 
ensure that the nucleic acid enSg Te proton or oEotl X„ *" 5 T ° f 3 ' primer ' taking care to 
respect to the poly A signal. The purified ftaK o^lBffi L » XP n^l ed 18 ccrre ^ Positioned with 
blunt ended with an exonuclease SlaSed^STS fSln ^ ! ft " 9 PCR reaction is digested «*" Ps «. 
and digested with Bglll 9 h Bgl pUnfied and " gated t0 P*T1. "°w containing a poly A signal 

isolated, purified, or enriched as described above expanded The expressed protein or polypeptide may be 

SlntainlTgT^ or polypeptides produced, by 

lacking such an insert The .^S^J^^T' pr ° te ' n 0r P o| VP e P«de are compared to those 

such as Coomas^e blue 0 * s7e Sing usinc .n£S.T 9 - ^ t0 ,hose skilled in the art 

nucleic acid insert. AntibodL Tcapab^ of^^X ^o^n 9 ^ „ ? ^T" ?' P 0 ^ M * enc °*ed by the 

Generally, the band co^^Jn^Smt^^^Sli^, T f"* ' nSert ' S being ex P ress <* and secreted, 
expected based on the nurrSS o? SrSn! JcS Tfl^T^X^tZl^ ^ 'T ^ haVe 3 m ° b ^ near that 
may have a mobility different than that Ix^d JTrJ^^r^r - ,he nUC '!' C ac,d ,nsert - However ' the band 
or enzymatic cleavage P S 3 M ° f modlficatlo ns such as glycosylation, ubiquitination. 
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encoding a secreted protein or portion thereof can be compared to the proteins expressed in control host cells 
containing the expression vector without an insert. The presence of a band in samples from ceils containing the 
expression vector with an insert which is absent in samples from cells containing the expression vector without an 
insert indicates that the desired protein or portion thereof is being expressed. Generally, the band will have the 
mobility expected for the secreted protein or portion thereof. However, the band may have a mobility different than 
that expected as a result of modifications such as glycosylate n t ubiquitination, or enzymatic cleavage. 
[0282] The expressed protein or polypeptide may be purified, isolated or enriched using a variety of methods. In 
some methods, the protein or polypeptide may be secreted into the culture medium via a native signal peptide or a 
heterologous signal peptide operably linked thereto. In some methods, the protein or polypeptide, may be linked to a 
heterologous polypeptide which facilitates its isolation, purification, or enrichment such as a nickel binding 
polypeptide. The protein or polypeptide may also be obtained by gel electrophoresis, ion exchange chromatography, 
size chromatography, hplc, salt precipitation, immunoprecipitation, a combination of any of the preceding methods, 
or any of the isolation, purification, or enrichment techniques familiar to those skilled in the art. 

[0283] The protein encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques using immunoaffinity chromatography with antibodies directed against the encoded protein or polypeptide 
as described in more detail below. If antibody production is not possible, the nucleic acid insert encoding the 
desired protein or polypeptide may be incorporated into expression vectors designed for use in purification schemes 
employing chimeric polypeptides. In such strategies, the coding sequence of the nucleic acid insert is ligated in 
frame with the gene encoding the other half of the chimera. The other half of the chimera may be p-globin or a nickel 
binding polypeptide. A chromatography matrix having antibody to p-globin or nickel attached thereto is then used to 
purify the chimeric protein. Protease cleavage sites may be engineered between the p-giobm gene or the nickel 
binding polypeptide and the extended cDNA or portion thereof. Thus, the two polypeptides of the chimera may be 
separated from one another by protease digestion. 

[0284] One useful expression vector for generating P-globin chimerics is pSG5 (Stratagene), which encodes 
rabbit p-globin. Intron II of the rabbit p-globin gene facilitates splicing of the expressed transcript, and the 
polyadenylation signal incorporated into the construct increases the level of expression. These techniques as 
described are well known to those skilled in the art of molecular biology. Standard methods are published in methods 
texts such as Davis et ai, (Basic Methods in Molecular Biology, L.G. Davis, M.D. Dibner, and J.F. Battey, ed., 
Elsevier Press, NY, 1986) and many of the methods are available from Stratagene, Life ' Technologies, Inc., or 
Promega. Polypeptide may additionally be produced from the construct using in vitro translation systems such as 
the In vitro Express™ Translation Kit (Stratagene). 

[0285] Following expression and purification of the proteins or polypeptides encoded by the nucleic acid inserts, 
the purified proteins may be tested for the ability to bind to the surface of various cell types as described in 
Example 21 below. It will be appreciated that a plurality of proteins expressed from these nucleic acid inserts may 
be included in a panel of proteins to be simultaneously evaluated for the activities specifically described below, 
as well as other biological roles for which assays for determining activity are available. 

EXAMPLE 21 

Analysis of Secreted Proteins to Determine Whether they Bind to the Cell Surface 

[0286] The EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related 
nucleic acids, fragments of positional segments of EST-related nucleic acids, nucleic acids encoding the EST-related 
polypeptides, nucleic acids encoding fragments of the EST-related polypeptides, nucleic acids encoding positional 
segments of EST-related polypeptides, or nucleic acids encoding fragments of positional segments of EST-related 
polypeptides are cloned into expression vectors such as those described in Example 20. The encoded proteins or 
polypeptides are purified, isolated, or enriched as described above. Following purification, isolation, or 
enrichment, the proteins or polypeptides are labeled using techniques known to those skilled in the art. The labeled 
proteins or polypeptides are incubated with cells or cell lines derived from a variety of organs or tissues to allow 
the proteins to bind to any receptor present on the cell surface. Following the incubation, the cells are washed to 
remove non-specifically bound proteins or polypeptides. The specifically bound labeled proteins or polypeptides are 
detected by autoradiography. Alternatively, unlabeled proteins or polypeptides may be incubated with the cells and 
detected with antibodies having a detectable label, such as a fluorescent molecule, attached thereto. 
[0287] Specificity of cell surface binding may be analyzed by conducting a competition analysis in which various 
amounts of unlabeled protein or polypeptide are incubated along with the labeled protein or polypeptide. The amount 
of labeled protein or polypeptide bound to the cell surface decreases as the amount of competitive unlabeled protein 
or polypeptide increases. As a control, various amounts of an unlabeled protein or polypeptide unrelated to the 
labeled protein or polypeptide is included in some binding reactions. The amount of labeled protein or polypeptide 
bound to the cell surface does not decrease in binding reactions containing increasing amounts of unrelated 
unlabeled protein, indicating that the protein or polypeptide encoded by the nucleic acid binds specifically to the 
cell surface. 

[0288] As discussed above, human proteins have been shown to have a number of important physiological effects 
and, consequently, represent a valuable therapeutic resource. The numan proteins or polypeptides made as described 
above may be evaluated to determine their physiological activities as described below. 

EXAMPLE 22 
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Assaying the Expressed Proteins or Polypeptides for Cytokine, Ceil Proliferation or Cell Differentiation Activity 

[0289] As discussed above, some human proteins act as cytokines or may affect cellular proliferation or 
differentiation. Many protein factors discovered to date, including all known cytokines, have exhibited activity in 
one or more factor dependent cell proliferation assays, and hence the assays serve as a convenient confirmation of 
cytokine activity. The activity of a protein or polypeptide of the present invention is evidenced by any one of a 
number of routine factor dependent cell proliferation assays for cell lines including, without limitation 32D DA2 
DA1G, T10, B9, B9/11, BaF3, MC9/G, M + a (preB M + a), 2E8, RB5 ( DA1, 123, T1165, HT2, CTLL2, TF-1, Mo7c and. 
CMK. The proteins or polypeptides prepared as described above may be evaluated for their ability to regulate T cell 
or thymocyte proliferation in assays such as those described above or in the following references: Current Protocols in 
Immunology, Ed. by J.E. Coligan et al. t Greene Publishing Associates and Wiley-lnterscience; Takai et al J 
Immunol. 137:3494-3500, 1986., Bertagnolli ef at. J. Immunol. 145:1706-1712, 1990. Bertagnolli ef al Cellular 
Immunology 133:327-341, 1991. Bertagnolli, et al. J. Immunol. 149:3778*3783, 1992' Bowman et at J 
Immunol. 152:1756-1761, 1994. 

[0290] In addition, numerous assays for cytokine production and/or the proliferation of spleen cells, lymph node 
cells and thymocytes are known. These include the techniques disclosed in Current Protocols in Immunology. J E 
Coligan et al. Eds., 1:3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; and Schreiber, R.D \nCurrent Protocols in 
Immunology. , supra 1:6.8.1-6.8.8. 

[0291] ^ The proteins or polypeptides prepared as described above may also be assayed for the ability to regulate 
■the _ proliferation and differentiation of hematopoietic or lymphopoietic cells. Many assays for such activity are 
familiar to those skilled in the art, including the assays in the following references: Bottomry et al In Current Protocols 
tn Immunology., supra. 1 : 6.3.1-6.3.12,; deVries et at., J.Exp. Med. 173:1205-1211, 1991; Moreau etai Nature 36690- 
692, 1988; Greenberger et al, Proc. Natl. Acad. Sci. U.S.A. 80:2931-2938, 1983; Nordan, R., \nCurrent Protocols in 
Immunology., supra. 1 : 6.6.1-6.6.5; Smith et al., Proc. Natl. Acad. Sci. U.S.A. 83:1857-1861 1986* Bennett er 
6 13 f Unent Pr ° tOCOlS in /mmunol °9y $u P r a 1 : 6.15.1; Ciarietta er al In Current Protocols in Immunology, supra 1 : 

[0292] The proteins or polypeptides prepared as described above may also be assayed for their ability to 
regulate T-cell responses to antigens. Many assays for such activity are familiar to those skilled in the art, 
including the assays described in the following references: Chapter 3 (In vitro Assays for Mouse Lymphocyte Function) 
Chapter 6 (Cytokines and Their Cellular Receptors) and Chapter 7, (Immunologic Studies in Humans) in Current 
Protocols tn Immunology supra; Weinberger er a/., Proc. Natl. Acad. Sci. USA 77:6091-6095 1980" Weinberger et al. 
Eur. J. Immun. 11:405-411, 1981; Takai et at., J. Immunol. 137:3494-3500, 1986; Takai et al J Immunol 140 508-512 
1988. 

[0293] Those proteins or polypeptides which exhibit cytokine, cell proliferation, or cell differentiation 
activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which induction of cell 
proliferation or differentiation is beneficial. Alternatively, as described in more detail below, nucleic acids 
encoding these proteins or polypeptides or nucleic acids regulating the expression of these proteins or polypeptides 
may be introduced into appropriate host cells to increase or decrease the expression of the proteins or polypeptides 
as desired. 

EXAMPLE 23 

Assaying the Expressed Proteins or Polypeptides for Activity as immune System Regulators 

[0294] The proteins or polypeptides prepared as described above may also be evaluated for their effects as 
immune regulators. For example, the proteins or polypeptides may be evaluated for their activity to influence 
thymocyte or splenocyte cytotoxicity. Numerous assays for such activity are familiar to those skilled in the art 
including the assays described in the following references: Chapter 3 (In vitro Assays for Mouse Lymphocyte Function 
3.1-3.19) and Chapter 7 (Immunologic studies in Humans) in Current Protocols in Immunology , J.E. Coligan et al 
Eds, Greene Publishing Associates and Wiley-lnterscience; Herrmann era/., Proc. Natl. Acad Sci USA 78 2488-2492 
1981; Herrmann et al., J Immunol. 128:1968-1974, 1982; Handa ef al, J. Immunol. 135:1564-1572, 1985" Takai ef a/' 
J. Immunol. 137:3494-3500, 1986; Takai ef al, J. Immunol 140:508-512, 1988; Bowman ef at J wro/ogy61-1992-' 
1998; Bertagnolli et al Cell. Immunol. 133:327-341, 1991; Brown et at., J. Immunol. 153:3079-3092, 1994. 
[0295] The proteins or polypeptides prepared as described above may also be evaluated for their effects on T- 
cell dependent immunoglobulin responses and isotype switching. Numerous assays for such activity are familiar to 
those skilled in the art, including the assays disclosed in the following references: Maliszewski, J. Immunol 1443028- 
3033, 1990; Mond et al. in Current Protocols in Immunology, 1 : 3.8.1-3.8.16, supra. 

[0296] The proteins or polypeptides prepared as described above may also be evaluated for their effect on immune 
effector cells, including their effect on Th1 cells and cytotoxic lymphocytes. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays disclosed in the following references' Chapter 3 {In vitro Assays 
for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic Studies in Humans) in Current Protocols in 
immunology, supra; Takai et at., J. Immunol 137:3494-3500, 1986; Takai ef al; J. Immunol 140 508-512 1988* 
Bertagnolli etal, J. Immunol 149:3778-3783, 1992. 

[0297] The proteins or polypeptides prepared as described above may also be evaluated for their effect on 
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dendritic cell mediated activation of naive T-cells. Numerous assays for such activity are familiar to those skilled 
in the art, including the assays disclosed in the following references: Guery et ai, J. Immunol. 134:536-544, 1995; Inaba 
et ai, J. Exp. Med. 173:549-559, 1991; Macatonia et ai, J. Immunol. 154:5071-5079, 1995; Porgador et al J. Exp. 
Med 182:255-260, 1995; Nairer ai, J. Virol. 67:4062-4069, 1993; Huang et ai, Science 264:961-965, 1994; 
Macatonia et al J. Exp. Med 169:1255-1264, 1989; Bhardwaj et ai, Journal of Clinical Investigation 94:797-807, 1994; 
and Inaba et ai, J. Exp. Med 172:631-640, 1990. 

[0298] The proteins or polypeptides prepared as described above may also be evaluated for their influence on the 
lifetime of lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, including the 
assays disclosed in the following references: Darzynkiewicz et ai. Cytometry 13:795-808, 1992; Gorczyca et ai, 
Leukemia 7:659-670, 1993; Gorczyca er ai, Cancer Res. 53:1945-1951, 1993; Itoh et ai, Cell 66:233-243, 1991; 
Zacharchuk, J. Immunol. 145:4037-4045, 1990; Zamai et ai, Cytometry 14:891-897, 1993; Gorczyca et ai, Int. J. 
OncolA :639-648, 1992. 

[0299] The proteins or polypeptides prepared as described above may also be evaluated for their influence on 
early steps of T-cell commitment and development Numerous assays for such activity are familiar to those skilled in 
the art, including without limitation the assays disclosed in the following references: Antica et ai, Blood 84:111-117, 
1994; Fine er ai, Cell. Immunol. 155:111-122, 1994; Galy er ai. Blood 85:2770-2778, 1995; Toki et ai, Proc. Nat. 
Acad Sci. USA 88:7548-7551, 1991. 

[0300] Those proteins or polypeptides which exhibit activity as immune system regulators activity may then be 
formulated as pharmaceuticals and used to treat clinical conditions in which regulation of immune activity is 
beneficial. For example, the protein or polypeptide may be useful in the treatment of various immune deficiencies 
and disorders (including severe combined immunodeficiency), e.g., in regulating (up or down) growth and 
proliferation of T and/or B lymphocytes, as well as effecting the cytolytic activity of NK cells and other cell 
populations. These immune deficiencies may be genetic or be caused by viral (e.g., HIV) as well as bacterial or 
fungal infections, or may result from autoimmune disorders. More specifically, infectious diseases caused by viral, 
bacterial, fungal or other infection may be treatable ucing the protein or polypeptide including infections by HIV, 
hepatitis viruses, herpesviruses, mycobacteria, Leishmania spp., plamodium. and various fungal infections such as 
candidiasis, pf course, in this regard, a protein or polypeptide may also be useful where a boost to the immune 
system generally may be desirable, i.e., in the treatment of cancer. 

[0301] Alternatively, the proteins or polypeptides prepared as described above may be used in treatment of 
autoimmune disorders including, for example, connective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre syndrome, autoimmune 
thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, graft-versus-host disease and autoimmune 
inflammatory eye disease. Such a protein or polypeptide may also to be useful in the treatment of allergic reactions 
and conditions, such as asthma (particularly allergic asthma) or other respiratory problems. Other conditions, in 
which immune suppression is desired (including, for example, organ transplantation), may also be treatable using the 
protein or polypeptide. 

[0302] Using the proteins or polypeptides of the invention it may also be possible to regulate immune responses 
either up or down. Down regulation may involve inhibiting or blocking an immune response already in progress or may 
involve preventing the induction of an immune response. The functions of activated T-cells may be inhibited by 
suppressing T cell responses or by inducing specific tolerance in T cells, or both. Immunosuppression of T cell 
responses is generally an active non-antigen-specific process which requires continuous exposure of the T cells to 
the suppressive agent. Tolerance, which involves inducing non-responsiveness or anergy in T cells, is 
distinguishable from immunosuppression in that it is generally antigen-specific and persists after the end of 
exposure to the tolerizing agent. Operationally, tolerance can be demonstrated by the lack of a T cell response upon 
reexposure to specific antigen in the absence of the tolerizing agent. 

[0303] Down regulating or preventing one or more antigen functions (including without limitation B lymphocyte 
antigen functions, such as, for example, B7 costimulation), e.g., preventing high level lymphokine synthesis by 
activated T cells, will be useful in situations of tissue, skin and organ transplantation and in graft-versus-host 
disease (GVHD). For example, blockage of T cell function should result in reduced tissue destruction in tissue 
transplantation. Typically, in tissue transplants, rejection of the transplant is initiated through its recognition 
as foreign by T cells, followed by an immune reaction that destroys the transplant. The administration of a molecule 
which inhibits or blocks interaction of a B7 lymphocyte antigen with its natural ligand(s) on immune cells (such as 
a soluble, monomeric form of a peptide having B7-2 activity alone or in conjunction with a monomeric form of a 
peptide having an activity of another B lymphocyte antigen (e.g., B7-1, B7-3) or blocking antibody), prior to 
transplantation, can lead to the binding of the molecule to the natural ligand(s) on the immune cells without 
transmitting the corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter prevents 
cytokine synthesis by immune cells, such as T cells, and thus acts as an immunosuppressant. Moreover, the lack of 
costimulation may also be sufficient to anergize the T cells, thereby inducing tolerance in a subject. Induction of 
long-term tolerance by B lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of 
these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, it may also be necessary 
to block the function of a combination of B lymphocyte antigens. 

[0304] The efficacy of particular blocking reagents in preventing organ transplant rejection or GVHD can be 
assessed using animal models that are predictive of efficacy in humans. Examples of appropriate systems which can be 
used include allogeneic cardiac grafts in rats and xenogeneic pancreatic islet cell grafts in mice, both of which 
have been used to examine the immunosuppressive effects of CTLA4lg fusion proteins in vivo as described in 
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Lenschowera/., Science 257:789-792 (1992) and Turka et al., Proc. Natl. Acad. Sci USA, 89:11102-11105 (1992). In 
addition, murine models of GVHD (see Paul ed. t Fundamental Immunology, Raven Press, New York, 1989, pp. 846- 
847) can be used to determine the effect of blocking B lymphocyte antigen function in vivo on the development of that 
disease. 

[0305] Blocking antigen function may also be therapeutically useful for treating autoimmune diseases. Many 
autoimmune disorders are the result of inappropriate activation of T cells that are reactive against self tissue and 
which promote the production of. cytokines and autoantibodies involved in the pathology of the diseases. Preventing 
the activation of autoreactive T cells may reduce or eliminate disease symptoms. Administration of reagents which 
block costimulation of T cells by disrupting receptor/ligand interactions of B lymphocyte antigens can be used to 
inhibit T cell activation' and prevent production of autoantibodies or T cell-derived cytokines which potentially 
involved in the disease process. Additionally, blocking reagents may induce antigen-specific tolerance of 
autoreactive T cells which could lead to long-term relief from the disease. The efficacy of blocking reagents in 
preventing or alleviating autoimmune disorders can be determined using a number of well-characterized animal models 
of human autoimmune diseases. Examples include murine experimental autoimmune encephalitis, systemic lupus 
erythmatosis in MRL/pr/pr mice or NZB hybrid mice, murine autoimmuno collagen arthritis, diabetes mellitus in OD 
mice and BB rats, and murine experimental myasthenia gravis (see Paul ed., Fundamental Immunology Raven Press 
New York, 1989, pp. 840-856). 

[0306] Upregulation of an antigen function (preferably a B lymphocyte antigen function), as a means of up 
regulating immune responses, may also be useful in therapy. Upregulation of immune responses may involve either 
enhancing an existing immune response or eliciting an initial immune response as shown by the following examples, 
f or instance, enhancing an immune response through stimulating B iymphocyte antigen function may be useful in 
cases of viral infection. In addition, systemic viral diseases such as influenza, the common cold, and encephalitis 
might be alleviated by the administration of stimulatory form of B lymphocyte antigens systemically. 
[0307] Alternatively, antiviral immune responses may be enhanced in an infected patient by removing T cells from 
the patient, costimulating the T cells in vitro with viral antigen-pulsed APCs either expressing the proteins or 
polypeptides described above or together with a stimulatory form of the protein or polypeptide and reintroducing the in 
vitro primed T cells into the patient. The infected cells would now be capable of delivering a costimulatory signal 
to T cells in vivo, thereby activating the T cells. 

[0308] In another application, upregulation or enhancement of antigen function (preferably B lymphocyte antigen 
function) may be useful in the induction of tumor immunity. Tumor cells (e.g., sarcoma, melanoma, lymphoma, 
leukemia, neuroblastoma, carcinoma) transfected with one of the above-described nucleic acids encoding a protein or 
polypeptide can be administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the 
tumor cell can be transfected to express a combination of peptides. For example, tumor cells obtained from a patient 
can be transfected ex vivo with an expression vector directing the expression of a peptide having B7-2-like activity 
alone, or in conjunction with a peptide having B7-1-Iike activity and/or B7-3-like activity. The transfected tumor 
cells are returned to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in vivo. 
[0309] The presence of the protein or polypeptide encoded by the nucleic acids described above having the 
activity of a B lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation signal 
to T cells to induce a T cell mediated immune response against the transfected tumor cells. In addition, tumor cells 
which lack or which fail to reexpress sufficient amounts of MHC class I or MHC class II molecules can be transfected 
with nucleic acids encoding all or a portion of (e.g., a cytoplasmic-domain truncated portion) of an MHC class I a 
chain and (3 2 microglobulin or an MHC class II a chain and an MHC class II (3 chain to thereby express MHC class I or 
MHC class II proteins on the cell surface, respectively. Expression of the appropriate MHC class I or class II 
molecules in conjunction with a peptide having the activity of a B lymphocyte antigen (e.g., B7-1, B7-2, B7-3) 
induces a T cell mediated immune response against the transfected tumor cell. Optionally, a nucleic acid encoding an 
antisense construct which blocks expression of an MHC class II associated protein, such as the invariant chain, can 
also be cotransfected with a DNA encoding a protein or polypeptide having the activity of a B lymphocyte antigen to 
promote presentation of tumor associated antigens and induce tumor specific immunity. Thus, the induction of a T 
cell mediated immune response in a human subject may be sufficient to overcome tumor-specific tolerance in the 
subject. Alternatively, as described in more detail below, nucleic acids encoding these immune system regulator 
proteins or polypeptides or nucleic acids regulating the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins as desired. 

EXAMPLE 24 

Assaying the Expressed Proteins or Polypeptides for Hematopoiesis Regulating Activity 

[0310] The proteins or polypeptides encoded by the nucleic acids described above may also be evaluated for their 
hematopoiesis regulating activity. For example, the effect of the proteins or polypeptides on embryonic stem cell 
differentiation may be evaluated. Numerous assays for such activity are familiar to those skilled in the art, 
including the assays disclosed in the following references: Johansson et ai Cell. Biol. 15:141-151 1995" Keller et al 
Mol. Cell. Biol. 13:473-486, 1993; McClanahan et al., Blood 81:2903-291 5, 1993. 

[0311] The proteins or polypeptides encoded by the nucleic acids described above may also be evaluated for their 
influence on the lifetime of stem cells and stem cell differentiation. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays disclosed in the following references: Freshney, M.G. 
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Methylcellulose Colony Forming Assays, in Culture of Hematopoietic Cells. R.I. Freshney, et al. Eds. pp. 265-268, 
Wiley-Liss, Inc., New York, NY. 1994; Hirayama et al., Proc. Natl. Acad. ScL USA 89:5907-5911, 1992; McNiece, IX 
and Briddell, R.A. Primitive Hematopoietic Colony Forming Cells with High Proliferative Potential, in Culture of 
Hematopoietic Cells. R.I. Freshney, et al. eds. Vol pp. 23-39, Wtley-Liss, Inc., New York, NY. 1994, Neben et 
al., Experimental Hematology 22:353-359, 1994; Ploemacher, R.E. Cobblestone Area Forming Cell Assay. In Culture 
Of Hematopoietic Cells, R.L Freshney, et al. Eds. pp. 1-21, Wiley-Liss, Inc., New York, NY. 1994; Spooncer, E., 
Dexter, M. and Allen, T. Long Term Bone Marrow Cultures in the Presence of Stromal Cells, in Culture of 
Hematopoietic Celts. R.I. Freshney, et al. Eds. pp. 163-179, Wiley-Liss, Inc., New York, NY. 1994; and Sutherland, 
H.J. Long Term Culture Initiating Cell Assay, in Culture of Hematopoietic Cells . R.I. Freshney, et al. Eds. pp.' 139-162, 
Wiley-Liss, Inc., New York, NY. 1994. 

[0312] Those proteins or polypeptides which exhibit hematopoiesis regulatory activity may then be formulated as 
pharmaceuticals and used to treat clinical conditions in which regulation of hematopoeisis is beneficial. For 
example, a protein or polypeptide of the present invention may be useful in regulation of hematopoiesis and, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal biological activity in 
support of colony forming cells or of factor-dependent cell lines indicates involvement in regulating hematopoiesis, 
e.g. in supporting the growth and proliferation of erythroid progenitor cells alone or in combination with other 
cytokines, thereby indicating utility, for example, in treating various anemias or for use in conjunction with 
irradiation/chemotherapy to stimulate the production of erythroid precursors and/or erythroid cells; in supporting 
the growth and proliferation of myeloid cells such as granulocytes and monocytes/macrophages (i.e., traditional CSF 
activity) useful, for example, in conjunction with chemotherapy to prevent or treat consequent myelo-suppression; in 
supporting the growth and proliferation of megakaryocytes and consequently of piateiets thereby allowing prevention 
or treatment of various platelet disorders such as thrombocytopenia, and generally for use in place of or 
complimentary to platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic stem 
ceils which are capable of maturing to any and all of the above-mentioned hematopoietic cells and therefore find 
therapeutic utility in various stem cell disorders (such as those usually treated with transplantion, including, 
without limitation, aplastic anemia and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem 
cell compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction with bone marrow 
transplantation or with peripheral progenitor cell transplantation (homologous or heterologous)) as normal cells or 
genetically manipulated for gene therapy. Alternatively, as described in more detail below, nucleic acids encoding 
these proteins or polypeptides or nucleic acids regulating the expression of these proteins or polypeptides may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins as desired. 

EXAMPLE 25 

Assaying the Expressed Proteins or Polypeptides for Regulation of T is sue Growth 

[0313] The proteins or polypeptides encoded by the nucleic acids described above may also be evaluated for their 
effect on tissue growth. Numerous assays for such activity are familiar to those skilled in the art, including the 
assays disclosed in International Patent Publication No. WO95/16035, International Patent Publication No. 
WO95/05846 and International Patent Publication No. W09 1/07491. 

[0314J Assays for wound healing activity include, without limitation, those described in: Winter, Epidermal Wound 
Healing, pps. 71-112 (Maibach, H1 and Rovee, DT, eds.), Year Book Medical Publishers, Inc., Chicago, as modified by 
Eaglstein and Mertz, J. Invest Dermatol 71:382-84 (1978). 

[0315] Those proteins or polypeptides which are involved in the regulation of tissue growth may then be 
formulated as pharmaceuticals and used to treat clinical conditions in which regulation of tissue growth is 
beneficial. For example, a protein or polypeptide may have utility in compositions used for bone, cartilage, tendon, 
ligament and/or nerve tissue growth or regeneration, as well as for wound healing and tissue repair and replacement, 
and in the treatment of burns, incisions and ulcers. 

[0316] A protein or polypeptide encoded by the nucleic acids described above which induces cartilage and/or bone 
growth in circumstances where bone is not normally formed, has application in the healing of bone fractures and 
cartilage damage or defects in humans and other animals. Such a preparation employing a protein or polypeptide of 
the invention may have prophylactic use in closed as well as open fracture reduction and also in the improved 
fixation of artificial joints. De novo bone synthesis induced by an osteogenic agent contributes to the repair of 
congenital, trauma induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic 
plastic surgery. 

[0317] A protein or polypeptide of this invention may also be used in the treatment of periodontal disease, and 
in other tooth repair processes. Such agents may provide an environment to attract bone-forming cells, stimulate 
growth of bone-forming cells or induce differentiation of progenitors of bone-forming cells. A protein of the 
invention may also be useful in the treatment of osteoporosis or osteoarthritis, such as through stimulation of bone 
and/or cartilage repair or by blocking inflammation or processes of tissue destruction (collagenase activity, 
osteoclast activity, etc.) mediated by inflammatory processes. 

[0318] Another category of tissue regeneration activity that may be attributable to the proteins or polypeptides 
encoded by the nucleic acids described above is tendon/ligament formation. A protein or polypeptide encoded by the 
nucleic acids described above, which induces tendon/ligament-like tissue or other tissue formation in circumstances 
where such tissue is not normally formed, has application in the healing of tendon or ligament tears, deformities 
and other tendon or ligament defects in humans and other animals. Such a preparation employing a tendon/ligament- 



-33- 



EP 1 033 401 A2 

like tissue inducing protein may have prophylactic use in preventing damage to tendon or ligament tissue as well as 
use tn the improved fixation of tendon or ligament to bone or other tissues, and in repairing defects to tendon or 
ligament tissue. De novo tendon/ligament-like tissue formation induced by a protein or polypeptide of the present 
invention contributes to the repair of tendon or ligaments defects of congenital, traumatic or other origin and is 
also useful in cosmetic plastic surgery for attachment or repair of tendons or ligaments. The proteins or 
polypeptides of the present invention may provide an environment to attract tendon- or ligament-forming cells 
stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors of tendon- or ligament- 
forming cells, or induce growth of tendon/ligament cells or progenitors ex vivo for return in vivo to effect tissue 
repair. The proteins or polypeptides of the invention may also be useful in the treatment of tendinitis carpal tunnel 
syndrome and other tendon or ligament defects. The therapeutic compositions may also include an appropriate matrix 
and/or sequestering agent as a carrier as is well known in the art. 

[0319] The proteins or polypeptides of the present invention may also be useful for proliferation of neural 
cells and for regeneration of nerve and brain tissue, i.e., for the treatment of central and peripheral nervous 
system diseases and neuropathies, as well as mechanical and traumatic disorders, which involve degeneration, death 
or trauma to neural cells or nerve tissue. More specifically, a protein or polypeptide may be used in the treatment 
of diseases of the peripheral nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized 
neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's disease, Huntington's disease 
amyotrophic lateral sclerosis, and Shy-Drager syndrome. Further conditions which may be treated in accordance with 
the present invention include mechanical and traumatic disorders, such as spinal cord disorders, head trauma and 
cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from chemotherapy or other medical 

... , / — - wwin^ u ^iwran v>i pv^iy pcfJUUC L»l IMC II IV CI IUUI I. 

[0320] Proteins or polypeptides of the invention may also be useful to promote better or faster closure of non- 
healing wounds, including without limitation pressure ulcers, ulcers associated with vascular insufficiency 
surgical and traumatic wounds, and the like. 

[0321] It is expected that a protein or polypeptide of the present invention may also exhibit activity for 
generation or regeneration of other tissues, such as organs (^eluding, for example, pancreas, liver intestine 
kidney, skin, endothelium) muscle (smooth, skeletal or cardiac) and vascular (including vascular endothelium) tissue" 
or for promoting the growth of cells comprising such tissues. Part of the desired effects may be by inhibition or 
modulation of fibrotic scarring to allow normal tissue to generate. A protein or polypeptide of the invention may 
also exhibit angiogenic activity. 

[0322] A protein or polypeptide of the present invention may also be useful for gut protection or regeneration 
and treatment of lung or liver fibrosis, reperfusion injury in various tissues, and conditions resulting from 
systemic cytokine damage. 

[0323] A protein or polypeptide of the present invention may also be useful for promoting or inhibiting 
differentiation of tissues described above from precursor tissues or cells; or for inhibiting the growth of tissues 
described above. 

[0324] Alternatively, as described in more detail below, nucleic acids encoding tissue growth regulating 
activity proteins or polypeptides or nucleic acids regulating the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins as desired. 
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Assaying the Expressed Proteins or P olypeptides for Regulation of Reproductive Hormones 

[0325] The proteins or polypeptides of the present invention may also be evaluated for their ability to regulate 
reproductive hormones, such as follicle stimulating hormone. Numerous assays for such activity are familiar to those 
skilled in the art, including the assays disclosed in the following references: Vale et a/., Endocrinol 91 562-572 1972' 
Ling et al.. Nature 321:779-782, 1986; Vale era/., NatureZZ 1:776-779, 1986; Mason et al Nature 318'659-663' 1985 ! 
Forage et al, Proc. Natl. Acad. ScL USA 83:3091-3095, 1986. Chapter 6.12 in Current Protocols in Immunology J E* 
Coligan et al. Eds. Greene Publishing Associates and Wiley-lntersciece ; Taub et al. J. Clin. Invest 95- 1370-1376 
1995; Lind et al. APM/S103;140-146, 1995; Muller er al. Eur, J. Immunol. 25:1744-1748 Gruber et al J 
Immunol. 152:5860-5867, 1994; Johnston et al., J Immunol. 153:1762-1768, 1994. 

[0326] Those proteins or polypeptides which exhibit activity as reproductive hormones or regulators of cell 
movement may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of 
reproductive hormones are beneficial. For example, a protein or polypeptide may exhibit activin- or inhibin-related 
activities. Inhibins are characterized by their ability to inhibit the release of follicle stimulating hormone (FSH) 
while activins are characterized by their ability to stimulate the release of FSH. Thus, a protein or polypeptide of 
the present invention, alone or in heterodimers with a member of the inhibin a family, may be useful as a 
contraceptive based on the ability of inhibins to decrease fertility in female mammals and decrease spermatogenesis 
in male mammals. Administration of sufficient amounts of other inhibins can induce infertility in these 
mammals. Alternatively, the protein or polypeptide of the invention, as a homodimer or as a heterodimer with other 
protein subunfts of the inhibin-B group, may be useful as a fertility inducing therapeutic, based upon the ability 
of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for examole, United 
States Patent 4,798,885. A protein or polypeptide of the invention may also be useful for advancement of the onset 
of fertility in sexually immature mammals, so as to increase the lifetime reproductive performance of domestic 
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animals such as cows, sheep and pigs. 



EXAMPLE 27 

Assaying the Fxpr esse d Prot e ins Of Polypeptide For rhpmo^ ^ ^chemnkinPtir 

13 c The Pf0 ! einS ° r po| yP e P tides of the P^ent invention may also be evaluated for chernotactic/chemokinetic 
neu.ph, to tumors 

Sy or indirect VJ dS^ ""^^ * PartiCUlar Ce " P°P ulafon if rt can ^ulate, 

polypeptide htS^V to diS T^^T™,?!?! J^. ?! ?°> Ulafon , Prefera ^. ». protein or 
has chemotactic activity' for a population" of' Veils" caTbT'reaX'Vt^Z^^^ 

polypeptide in any known assay for cell chemoSxis V deterrn,ned b " em P |ovin 9 su <* P^tein or 

JoSg meSs^ * * " ° f the inVenti ° n ma * ™9 other means . be measured by the 

* ASSay ! f ° r chemotactic activit y (which will identify proteins or polypeptides that induce or orevent 

EXAMPLE 28 

Assaying the Fxpressed Proteins or Polype ptide for Renulatinn 0 f Blood Clnttinr, 

SI m The Pr0te ' nS ° r P^yP^^ 68 of the present invention may also be evaluated for their effects on blood 
EXAMPLE 29 

Assaying the, Fxpressed . Proteins or Polypeptides for ipy n iv»mpnt ■„ Ff > rpptM n m an d interim 

Immunology, J E Coliaan er al Eds nrepnp p ,hiLhfr!„ a ? P 1 " 7 ' 28 ' 22 ) m Currenr Protocols in 

Acad. ScLUSA ^8^6864-6868 1987 R?prpr I S ^ C w te L a 0 nd W " e y-'nterscience; Takai er a/., Proc. /Va«. 
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[0335] For example, the proteins or polypeptides of the present invention may also demonstrate activity as 
receptors receptor ligands or inhibitors or agonists of receptorfligand interactions. Examples of such receptors 
fp n rpnlnf n nh„l n h U » e ' °^ J!™ 13 ? 0 "- °* ok ™ receptors ™<* ^eir "Sands, receptor kinases and their ligands 
w£rii£«nn a ^.f. J ' l9an , dS ' , reC ? t0, ! inV0lved in ce,| - ce " '"fractions and their ligands (including 
wrthout lim.tat.on, cellular adhesion molecules (such as selectins, integrins and their ligands) and receptor/ligand 
pairs involved in anbgen presentation, antigen recognition and development of cellular and humoral immune responses) 
Receptors and ligands are also useful for screening of potential peptide or small molecule inhibitors of the 
i^K. receptor/ligand interaction. A protein or polypeptide of the present invention (including, without 
ZlTr' , fragments ° f receptors and ligands) maybe useful as inhibitors of receptor/ligand interactions 
Alternatively, as described in more detail below, nucleic acids encoding proteins or polypeptides involved in 
nt^S !? a 1 mteract ?° ns ° r nucleic acids regulating the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or decrease the expression of the proteins or polypeptides as desired. 

EXAMPLE 30 

Assaying the Protein s or Polypeptides for Anti-lnflammatorv Activity 

[ °? 3 ! J tu The P roteins or Polypeptides of the present invention may also be evaluated for anti-inflammatory 
activity. The anti-inflammatory activity may be achieved by providing a stimulus to cells involved in the 
inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for example cell adhesion) by 
mniDiting or promoting chemotaxis of cells involved in the InflamTiatniv 

extravasation or by stimulating or suppressing production of othe'r "factors' w^^ch^moredrrectly rnhibit^Tomote 
an inflammatory response. Proteins or polypeptides exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions, including without limitation inflammation associated with 
| U ik rf ? Sh0CK , SepS ' S 0r Systemic inf| ammatory response syndrome), ischemiareperfusioninury, 
endotoxin lethality, arthritis, complement-mediated hyperacute rejection, nephritis, cytokine- or chemokine-induced 
lung injury, inflammatory bowel disease, Crohn's disease or resulting from over production of cytokines such as TNF 
or IL- . Proteins or polypeptides of the invention may also be useful to treat anaphylaxis and hypersensitivity to 
an antigenic substance or material. Alternatively, as described in more detail below, nucleic acids encoding anti- 
inflammatory activity proteins or polypeptides or nucleic acids regulating the expression of such proteins or 
po ypep ides may be introduced into appropriate host cells to increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 31 

Assaying the Expressed Proteins or P o lypeptides for Tumor Inhibition Activity 

i Vll Proteins or polypeptides of the present invention may also be evaluated for tumor inhibition 
activity. In addition to the activities described above for immunological treatment or prevention of tumors a 
protein or polypeptide of the invention may exhibit other anti-tumor activities. A protein or polypeptide may inhibit 
umor growth directly or .ndirectly.(such as, for example, via ADCC). A protein or polypeptide may exhibit its tumor 

n™2lv?fc'* L aCtn9 tiss « ue or tumor Precursor tissue, by inhibiting formation of tissues 

necessary to support tumor growth (such as, for example, by inhibiting angiogenesis). by causing production of other 
factors, agents or ceN types which inhibit tumor growth, or by suppressing, eliminating or inhibiting factors 
agente or eel types which promote tumor growth. . Alternatively, as described in more detail below nucleic acids 
nrnton2 9 n r P ^ lnS ?w P^eP*^ 3 , wi * ^or inhibition activity or nucleic acids regulating the expression of such 
proteins or polypeptides may be introduced into appropriate host cells to increase or decrease the expression of the 
proteins or polypeptides as desired. 

[0 ? 3 ! 1 A P rotein or P°'ypeptide of the invention may also exhibit one or more of the following additional 
activities or effects, inhibiting the growth, infection or function of, or killing, infectious agents, including 
withou limitation, bacteria, viruses, fungi and other parasites; effecting (suppressing or enhancing) bodily 
characteristics, including, without limitation, height, weight, hair color, eye color, skin, fat to lean ratio or 

rimL!™^l 9m 0r , 0rgan ° r b0dy Part Si2e or shape (such as ' for example, ^ast augmentation or 

diminution, change in bone form or shape); effecting biorhythms or circadian cycles or rhythms; effecting the 
fertility of male or female subjects; effecting the metabolism, catabolism. anabolism. processing utilization 
nlr 96 ,° f , f lrnlnatlon of dletar y fat. lipid, protein, carbohydrate, vitamins, minerals, cofactors or other 
nutntiona factors or component's); effecting behavioral characteristics, including, without limitation, appetite 
!KL«- ' °2 9nitl0n Includ,n 9 C0 9mtive disorders), depression (including depressive disorders) and violent 

SSS P /° Vldln S ana ' geSIC 6ffe u CtS ° r ° ther pain reducin 9 effecte ; Promoting differentiation and growth o 
embryonic stem cells in lineages other than hematopoietic lineages; hormonal or endocrine activity in the case of 
enzymes correcting deficiencies of the enzyme and treating deficiency-related diseases' treatment of 
thTt'SS T v rderS (such as ' for example ' P^riasis); immunoglobulin-like activity (such as, for example 

the ability to bind antigens or complement); and the ability to act as an antigen in a vaccine composition to raise 
an immune response against such protein or another material or entity which is cross-reactive with such protein 
Alternatively, as descnoea in more detail below, nucleic acids encoding proteins or polypeptides involved in any of 
the above mentioned activities or nucleic acids regulating the expression of such proteins may be introduced into 
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appropnate host cells to increase or decrease the expression of the proteins or polypeptides as desired. 

EXAMPLE 32 

Identification of Proteins or Poly pe ptides which Internet with Protein* nr Poly peptides of th» Pr~ on t ■ ^ 

KLtor oroSn! °I!K^ T"* W " h the Pr ° teinS ° r P°'yP e P« des of the present invention, such 
Sa^No kS^ Tin^L^ A ed H Wn i^ hybnd SyStems such as the Matchmaker Two Hybrid System 2 
(uataiog No. K1 604-1, Clontech). As described in the manual accompanying the kit nucleic acids encoding th» 

^ e,nS t Hnf, 0 J yPePt, 5 eS ° r th6 Present invention - are inserted '^o an expression vectofTuch fhat thev a ^e in 
frame with DMA encoding the DNA binding domain of the yeast transcriptional activator GAL4 cDNAs n a cSna Hb!a v 
I=n« n n nCOde pr ° te ! ns t or Polypeptides which might interact with the proteins o ^ polypeSes oMhe □ S 
ZSofSITKS' " S6C0nd T reS f n V6Ct0r SUCh ** they are in frame ««• DNASSkS^JSi 

£ se ^^^ transform h ed ^ yeast and the yeast are p |ated on se ^ tion 

exoression of thi H ? Toi^ 0 _ f r ^te c table markers on each of the expression vectors as well as GALA dependent 
expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for gai a 

nttmSl 1 h '^^r 10 "- Th0SS Ce " S WhiCh are positive in both the histidine selection and t he S3 assa conto 

[S AftemaXTlIr'th " int6raCt ^ the proteins or P 0, yP e P Mes of the present Son " 

£ il 1 ■ Aftern , atlv t ly - the . system described in Lustig etal.. Methods in Enzymology2B3: B3-99 (1997) mav be used 
for ident.fy.ng molecules which interact with the proteins or polypeptides of the present invention In such sitems 
wfro transenpton reactions are performed on a oool of vnrtnw m„t«Sni B « Xi»!„ 1 ™™L sys,emSl .."' 
protens or polypeptides of the present invention.' The^ nucleic acid~ in^ts ^TonlTci^eZ of "promoter 

2^£ZXSS^^ pools of mRNAs are -° 

LSo 1 Th Alt , er " ative| y' the pooled " vitro transcription products produced as described above may be translated in 
Sir wSSSZL Pr ° dUC,S 06 3 d6Sired acmv ° r for in,erai wl?! a Swn 

ES, Proteins, polypeptides or other molecules interacting with proteins or polypeptides of the present 
or r °H nd by 3 / ariety ° f addi,ional tsch ™°.^- in one method, affinity columns containing the protein 

rJl f P P u ° f the P resent '™ention-can be constructed. In some versions, of this method I the affiniST column 
contains chimeric prote.ns in which the protein or polypeptide of the present nvention ta fused I to oShione s 

SSTT A T r f Ce " Ular Pr0,a ' ns 0r P00 ' of e *P ressed P r °teins as described above and s aSfd to the" 
affinity column. Molecules interacting with the protein or polypeptide attached to the column can then be iso°at5d 
and analyzed on 2-D electrophoresis gel as described in Ramunsen etal. Electrophoresis ll 588 598 MoS? 
Alternately, the molecules retained on the affinity column can be purified by electrophoreses based met^ds ^d 
STia^anrodier 0 ' ™ * ^ * **» M ^ * ™ ^ W^TJ.Zt 
Sain OnSn Ul6S ''^^ ^ ^ Pf ° teinS ° f P°'yP e P tides "f the present invention can also be screened by 

surface? inZl P I ' ded * ° CCUr$ evanescent field (which extend a few hundred nanometers from L sensor 

invention II th t ? n ' n9 f SayS ' *u he tar96t m0leCUle Can be one of the P rotei "s or polypeptides oMhe presen 
nvenhon and the tes sample can be a collection of proteins, polypeptides or other moieties extracted fmm 

o e nti e dL 0 Tn e "t S| 3 P0 °' 0f H ex f P ressed P roteins . combinatorial peptide' and/ P or chemtealDbraS 0 phage alplayed 

raSJi f T" ° f C u IIS fr ° m WhiCh th6 t6St m0l6CuleS are extracted can ° ri 9 inate from any spedes 9 P V 
10344] in other methods, a target protein or polypeptide is immobilized and the test population' is a collection 
of unique proteins or polypeptides of the present invention. popuiauon is a collection 

[0345] To study the interaction of the proteins or polypeptides of the present invention with rin.™ th» 

SSESt"^ -° HP t: C m6th0d d6SCribed by Wa "9 et aL ^'0^%^ 

capflary electrophore Sl s method described by Busch ef a/., J. Chromatogr. 777:311-328 (1997) canbe usea * 
Sfh L n JiL SyS « deSCfi ^ d , U -!- Patent Na 5 ' 654 ' 150 ma y ab0 be used to id entify molecules which interact 

assayed for interaction with a known polypeptide or antibody. reacuon proaucts are 

SI '* ^ be a PPI eciated ^ tnose skil| ed in the art that the proteins or polypeptides of the present 
nvenhon may be assayed for numerous activities in addition to those specifically enumerated I above Frexamote 
n fl a mT«on ed t Pf0te,nS n fyP^^ mg y b ^ evaluated for applications involving So! and SufatoTof 

orXSer a °y r be P ^^^ — ~ In addition, th^Zinl 

!3U i^^^ =: s a ^^ 
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antibodies or polyclonal antibodies. As used herein, "antibody" refers to a polypeptide or orouo of nnhmomw.. 
which are comprised of at least one binding domain, where a binding domain? Eed from the "fel?«aKS£ 

£!2L «ft USed « herein '.- " anti 9 enic determinant' is the portion of an antigen molecule, that determines the 
speaficity of the ant.gen-ant.body reaction. An "epitope" refers to an antigenic determinant of a oolvoeoid? An 
2£ S^TTm f Tf 3 ! min ° 3CidS in «**»™*»n which is unfque™ t e epitope Sri tn 

2SSS„f a< ,6a ? t 6 SUCh amino adds ' and more usual| y at least 8-10 such amino acids' Methods for 

10 SZ^h am ; n ° aC ' dS WhlCh make up an epitope include x-ray crystallography, 2^maSnaTSan^J?c 
resonance, and epitope mapping e.g. the Pepscan method described by H Mario Genen et al 1984 P, 
Sc US A. 81:3998-4002; PCT Publication No. WO 84/03564; and Pc/pubSon Swc » 84/03506 
SS™L k «T me , * mbodimentS ' 108 antibodies may be capable of specifically binding to a protein or polypeptide 
ZZt EST f - related t nu ^ lelc acids ' fra g™nts of EST-related nucleic acids positional segments of Est rela ed 

0 , ' ~/~~7 r 7 rur ' uv ' 1 ^ J,UU(,ai ocyiMcm ui an coi-reiatea polypeptide or fraament of a nncitinnai 

aTr^L \ 3n EST - r f ,ated P°'yPfPtide. In some embodiments, the antibody may be capable of So an anSen.c 
of ? EST lllTJ:^ 30 f EST - related Polypeptide, fragment of an EST-related po^peptide Stional segment 
m« J T ! t Polypeptide or fragment of a positional segment of an EST-related polypeptide 9 

ET- n„ . • ! T-t ° f S6Creted Pr ° teinS ' the antib0(iies ma y ° e capable of binding a full-length protein encoded 
EXAMPLE 33 

Production of an Ant ibody to a Human Polypep tide nr Proton 

30 [0353] - The a bove desc ribed EST-related nucleic acids, fragments of EST-related nucleic acids positional 
SI? °H f ES rJ; r T at f d t T l T addS ° r fragments of positional se sments of EST-related nucSc adds or nucleic 

35 !? 3541 , ln Case of secrete d Proteins, nucleic .acids encoding the full protein (ie the mature orotein and 
Sffl ofSSS ."Hdf aCi H S T*? 8 "J - matUre pr0,ein < ie - *• pratein b cZTge of X .taS 

d££d above " 9 S,9nal Pepbde are ° perably linked t0 promoters a "d introduced into ceff as 

c^nclration^o^ are ,h6n substantia "y P uri "ed or isolated as described above. The 

device to the lie of a eJ uo/ml f '* 8d f ,U ? ed ', f ° f eXamp,e ' by ooncentration on an Amicon filter 

prepared as follows: ^ Monoclonal or polyclonal antibody to the protein or polypeptide can then be 

1. Monoclonal Antibody Prod uction by Hybridoma Fusion 

are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures such as 

aescnoea in uavis, L et al. m Basic Methods in Molecular Biology Elsevier, New York. Section 21-2. 

55 

2. Polyclonal Antibody Pro duction bv Immunization 

[0357] Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein or polypeptide 
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can be prepared by immunizing suitable animals with the expressed protein or peptides derived therefrom, which can 
be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many 
factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic 
than others and may require the use of carriers and adjuvant. Also, host animals response vary depending on site of 
inoculations and doses, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small 
doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective 
immunization protocol for rabbits can be found in Vaitukaitis. ef al.J. Clin. Endocrinol. Metab. 33:988-991 (1971). 
[0358] Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, 
as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of 
the antigen, begins to fall. See, for example, Ouchterlony, ef a/., Chap. 19 in: Handbook of Experimental 
Immunology D.Vvler (ed) Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12 pM). Affinity of the antisera for the antigen is determined by preparing competitive 
binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology. 2d Ed. (Rose and 
Friedman, Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980), 

[0359] Antibody preparations prepared according to either of the above protocols are useful in a variety of 
contexts. In particular, the antibodies may be used in immunoaffinity chromatography techniques such as those 
described below to facilitate large scale isolation, purification, or enrichment of the proteins or polypeptides 
encoded by EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids or for the isolation, purification or enrichment of EST-related polypeptides, 
fragments of EST-related polypeptides, positional segments of EST-related polypeptides or fragments of positional 

OOrtmon+o rtf COT r~l~*~~l 

[0360] In the case of secreted proteins, the antibodies may be used for the isolation, purification, or 
enrichment of the full protein (i.e. the mature protein and the signal peptide), the mature protein (i.e. the 
protein generated by cleavage of the signal peptide), or the signal peptide are operably linked to promoters and 
introduced into cells as described above. 

[0361] Additionally, the antibodies may be used in immunoaffinity chromatography techniques such as those 
described below to isolate, purify, or enrich polypeptides which have been linked to the proteins or polypeptides 
encoded by EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids or to isolate, purify, or enrich EST-related polypeptides, fragments of EST- 
related polypeptides, positional segments of EST-related polypeptides or fragments of positional segments of EST- 
related polypeptides. 

[0362] The antibodies may also be used to determine the cellular localization of polypeptides encoded by the 
proteins or polypeptides encoded by EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids or the cellular localization of EST-related 
polypeptides, fragments of EST-related polypeptides, positional segments of EST-related polypeptides or fragments of 
positional segments of EST-related polypeptides. 

[0363] In addition, the antibodies may also be used to determine the cellular localization of polypeptides which 
have been linked to the proteins or polypeptides encoded by EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-;elated nucleic acids or polypeptides which have 
been linked EST-related polypeptides, fragments of EST-related polypeptides, positional segments of EST-related 
polypeptides or fragments of positional segments of EST-related polypeptides . 

[0364] The antibodies may also be used in quantitative immunoassays which determine concentrations of antigen- 
bearing substances in biological samples; they may also used semi-quantitatively or qualitatively to identify the 
presence of antigen in a biological sample or to identify the type of tissue present in a biological sample. The 
antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the 
levels of the protein in the body. 

V. Use of 5'ESTs and Consensus Contigated 5 f ESTs or Sequences Obtainable Therefrom or Portions Thereof 
as Reagents 

[0365] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may be used as reagents in isolation procedures, diagnostic assays, 
and forensic procedures. For example, sequences from the EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-related nucleic acids, may be detectably labeled 
and used as probes to isolate other sequences capable of hybridizing to them. In addition, the he EST-related 
nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids may be used to design PCR primers to be used in isolation, diagnostic, or forensic procedures. 

1. Use of EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of E S T-related nucleic acids in isolation, diagnostic and fo rensic procedures 

EXAMPLE 34 

Preparation of FCR Primers and Amplification of DNA 
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[0366] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may be used to prepare PCR primers for a variety of applications, 
including isolation procedures for cloning nucleic acids capable of hybridizing to such sequences, diagnostic 
techniques and forensic techniques. In some embodiments, the PCR primers at least 10, 15, 18, 20, 23, 25*, 28, 30, 40, 
or 50 nucleotides in length. In some embodiments, the PCR primers may be more than 30 bases in length. It is 
preferred that the primer pairs Have approximately the same G/C ratio, so that melting temperatures are 
approximately the same. A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR 
technology, see Molecular Cloning to Genetic Engineering White, B.A. Ed. in Methods in Molecular Biology 67: Humana 
Press, Totowa 1997. In each of these PCR procedures, PCR primers on either side of the nucleic acid sequences to be 
amplified are added to a suitably prepared nucleic acid sample along with dNTPs and a thermostable polymerase such 
as Taq polymerase, Pfu polymerase, or Vent polymerase. The nucleic acid in the sample is denatured and the PCR 
primers are specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized primers 
are extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated The cycles are 
repeated multiple times to produce an amplified fragment containing the nucleic acid sequence between the primer 
sites. 

EXAMPLE 35 

Use pf the EST - r e|ated nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-rel ated nucleic acids as probes 

[0367] Probes derived from EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids may be labeled with detectable labels familiar to 
those skilled in the art, including radioisotopes and non-radioactive labels, to provide a detectable probe. The 
detectable probe may be single stranded or double stranded and may be made using techniques known in the art, 
including in vitro transcription, nick translation, or kinase reactions. A nucleic acid sample containing a sequence 
capable of hybridizing to the labeled probe is contacted with the labeled probe. If the nucleic acid in the sample 
is double stranded, it may be denatured prior to contacting the probe. In some applications, the nucleic acid sample 
may be immobilized on a surface such as a nitrocellulose or nylon membrane. The nucleic acid sample may comprise 
nucleic acids obtained from a variety of sources, including genomic DNA, cDNA libraries, RNA, or tissue samples. 
[0368] Procedures used to detect the presence of nucleic acids capable of hybridizing to the detectable probe 
include well known techniques such as Southern blotting, Northern blotting, dot blotting, colony hybridization, and 
plaque hybridization. In some applications, the nucleic acid capable of hybridizing to the labeled probe may be 
cloned into vectors such as expression vectors, sequencing vectors, or in vitro transcription vectors to facilitate the 
characterization and expression of the hybridizing nucleic acids in the sample. For example, such techniques may be 
used to isolate and clone sequences in a genomic library or cDNA library which are capable of hybridizing to the 
detectable probe as described in Example 18 above. 

[0369] PCR primers made as described in Example 34 above may be used in forensic analyses, such as the DNA 
fingerprinting techniques described in Examples 36-40 below. Such analyses may utilize detectableprobes or primers 
based on the sequences of the EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids. 

EXAMPLE 36 

Forensic Matching by DNA Sequencing 

[0370] In one exemplary method, DNA samples are isolated from forensic specimens of, for example, hair, semen, 
blood or skin cells by conventional methods. A panel of PCR primers based on a number of the EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids is then utilized in accordance with Example 34 to amplify DNA of approximately 100-200 bases in length from 
the forensic specimen. Corresponding sequences are obtained from a test subject. Each of these identification DNAs 
is then sequenced using standard techniques, and a simple database comparison determines the differences, if any, 
between the sequences from the subject and those from the sample. Statistically significant differences between the 
suspect's DNA sequences and those from the sample conclusively prove a lack of identity. This lack of identity can 
be proven, for example, with only one sequence. Identity, on the other hand, should be demonstrated with a large 
number of sequences, all matching. Preferably, a minimum of 50 statistically identical sequences of 100 bases in 
length are used to prove identity between the suspect and the sample. 

EXAMPLE 37 

Positive Identification bv DNA Sequencing 

[0371] The technique outlined in the previous example may also be used on a larger scale to provide a unique 
fingerprint-type identification of any individual. In this technique J primers are prepared from a large number of 
EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments of 
EST-related nucleic acids. Preferably, 20 to 50 different primers are used. These primers are used to obtain a 
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corresponding number of PCR-generated DNA segments from the individual in question in accordance with Example 
34. Each of these DNA segments is sequenced, using the methods set forth in Example 36 The database of 
sequences generated through this procedure uniquely identifies the individual from whom the sequences were 
obtained The same panel of primers may then be used at any later time to absoluteV correlate tissue or other 
biological specimen with that individual. - ua»ue or pmer 



EXAMPLE 38 

Southern Blot Forensic Identification 



a } . T, procedure of Exa mple 37 is repeated to obtain a panel of at least 10 amplified sequences from an 
n^rnJ^t ^ ^ / referablv - th , e P ane l contains at least 50 amplified sequences. More preferably, the 
Prr^ f™ t ?H nMA am ^ fied , Seqi l e !l CeS J n SOme embodiments . the panel contains 200 amplified sequences This 
^rh" 9 pn^lJ , ' S If Sted ,T, th ° n l ,° r a combina « on Preferably, four base specific restriction enzymes. 
LnmontfTr! ^ comme ' c ^ available and known to those of skill in the art. After digestion, the resultant gene 
w h- f l 6Parated "1 r^ 1 * dUpli0ate Wells on an a 9 arose 3el and transferred to nitrocellulose using 

M°! n ? Ct l?'T eS , W t ^ n0wn t0 those with skfll in the art For a r ev'ew of Southern blotting see Davis et 
al. (Basic Methods in Molecular Biology, 1986, Elsevier Press, pp 62-65). 

[0373] A panel of probes based on the sequences of the EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-related nucleic acids are radioactively or 
colonmetncally labeled using methods known in the art. such as nirk t^oci^i™ nr .i,h«i;-~ —a u..^-.-L 

is 6 ^n^f n ,fl bl0 4 U « 9 ]n Ch C n qU 7 e c S T n Wn in the art (Davis etal - su P ra > P^^^^^^^^^, 
?A 3 ^ 35 ' 40 ' 50 ' 75 ' 100 ' 150 - 200 ' 30 °- 400 or 500 nucleotides in length. Preferably the probes are 

t^l !'h 12 ' ^l 8 - 2 °i 25 ' 2 \ 30 ' 35 ' 4 °' 50 ' 75 ' 100 ' 150 ' 200 ' 300 ' 400 ° r 5 °0 nucleoMes^lengTh. fn 
some embodiments, the probes are oligonucleotides which are 40 nucleotides in length or less 

[0374] PreferabV, at least 5 to 10 of these labeled probes are used, and more preferably at least about 20 or 
30 are used to provide a unique pattern. The resultant bands appearing from the hybridization of a large sample of 
l T T l a ^ n n u , c ' eic ac,ds - P™ 1 ' 0 " 9 ' segments of EST-related nucleic acids or fragments of positional segments of 
EST-related nudac acids will be a unique identifier. Since the restriction enzyme cleavage will be different for 
S n vi u '^ band P a f ern on the Southe ™ blot will also be unique. Increasing the number of probes v3 
.1 i h f tlstlca v h '9 he ; level of confidence in the identification since there will be an increased number 0 
sets of bands used for identification. 



EXAMPLE 39 

Dot Blot Id entification Procedure 



L^t 51 , , , Ano ! her technique for identifying individuals using the EST-related nucleic acids, positional segments of 
a'dlt'K^ ° f P ° Siti0nal SegmentS ° f EST - rela,ed nucleic acids disc «- d h-ein utilizes 

< n Gen f om L c ; is isolated from nuc 'ei °f subject to be identified. Probes are prepared that correspond to 
1,^1 ?r preferablv f 50 sequences from the EST-related nucleic acids, positional segments of EST-related nucleic 
acds or fragments of positional segments of EST-related nucleic acids. The probes are used to hybridize to the 
genomic DNA through conditions known to those in the art. The oligonucleotides are end labeled with P 32 a usinq 
polynucleotide kinase (Pharmacia). Dot Blots are created by spotting the genomic DNA onto nitrocellulose or theTke 
using a vacuum dot blot manifold (BioRad. Richmond California). The nitrocellulose filter containing the genomic 
sequences ,s baked or UV linked to the filter, prehybridized and hybridized with labeled probe using techniques 
S^'r^iS^H 6 ! a !" SUPfa) . J« 6 32aP ^ Bled ° NA fragments are sequentially hybrids wrth successively 
cSoride L J» m { mm , lmal dlffe ? nC6S betW f 60 the 30 bp Sequence and ,he DNA - Tetramethylammonium 

ly^MVil^L^SS^f 0 ^™" 8 Sma " numbers of nucleotide mismatches (Wood et al., Proc. Natl. 
™™ U £^2(6):1585-1588 (1985)). A unique pattern of dots distinguishes one individual from another individual 

1 - ,11; a / 6d nUde ' C acids ' P° sitional segments of EST-related nucleic acids or fragments of positional 
fn e ™^mhoH?rJ: re ^ f l nUC |!' C acids ,. can b , e used as Probes in the following alternative fingerprinting technique. 
In some embodiments, the probes are oligonucleotides which are 40 nucleotides in length or less 

f P C eT rab / y ; a PlU , rality °L pr ° beS havin9 se^e^es from different EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids are used in 
MreZeTL^ 9 ??" k 9 tech nique. Example 40 below provides a representative alternative fingerprinting 
procedure ,n which the probes are derived from EST-related nucleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-related nucleic acids. 

EXAMPLE 40 

Alternative "Fing erprint" Identification Technic 

[0379] Oligonucleotides are prepared from a large number, e.g. 50, 100, or 200, EST-related nucleic adds 
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positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids 
using commercially ava.lable oligonucleotide services such as Genset Paris Franc PieferaK t£ 
oligonucleotides are at least 10, 15, 18, 20, 23, 25 28, or 30 nucleotides in length. HweverTn lome embodtnenTs 
the oligonucleotides may be more than 30 nucleotides in length. emDoaiments, 

K? »rt ?hl Sam f' eS fr0 T • he J - eSt SUbjCCt are P rocessed f °' DNA using techniques well known to those with skill 
The nucleic acid ,s digested with restriction enzymes such as EcoRI and Xbal Following o^esfon 
samples are app ed to wells for electrophoresis. The procedure, as known in the art, may be modified tc Accommodate 

iSS^iTS^JST^ " l . hiS eXamP ' e V Samp,eS COntainin 9 5 ug « fnto Zs and 

techniques 9 9 ' The 9e ' S are transferred ° nt ° nitrocellulose using standard Southern blotting 

ESSLw- J° >?. 9 K? f ? ach °! the oli 9 onucle otides are pooled and end-labeled with P 32 a The nitrocellulose is 
prehybridized with blocking solution and hybrkfced with the labeled probes. Following hybridizatio and I waSna 

uSmSST " 6XP0Sed t0 X -° mat AR ^ film ' The reSU,ting h y brid * a,ion Pattern ^rbe unSfe TSk 

S^LJ!^^^^ within this example *« the number ° y probe sequences USed ca " be varied 

EJSnf nf'c« s a T di ? C ^ t0 th , 6ir ap P'! cations in forensics and identification, EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids may be 
mapped to their chromosomal locations. Example 41 below describes radiation hybrid [^1^1^ of human 
y™^??™^*?:^ nuc,eic «** P-M-n- ^ments of EST-related Sic !3S Z foments 
m^n'^T^ZZ" U '," l "T a nUC ' elC aC,dS ' tXample 42 below describes a representative procedure for 
segments S eStILZ Z^ ° f EST " related acids or fragments of positional 

mfn^fnn I ! nu . c e,c aclds t0 the,r locations on human chromosomes. Example 43 below describes 

seaments of lif^n?^^*' S6 T entS ° f nucleic acids or fragment of pSnal 

segments of EST-related nucleic acids on metaphase chromosomes by Fluorescence In Situ Hybridization (FISH). 

i^ S ^ f , E ?r"-.T a !!-i n i CleiC 3ddS ' posit j onal segm ents of EST-r e l ated nucleic acids or fo ments of positional 
segments of E ST-related nucleic acids m C hromosomp. Ma pping H ^ 

EXAMPLE 41 

Radiation hybrid mapping of FST- related nucle ic ac ids, positional np nmonh of EST-related nucleic acids orfraaments 
of positional segments nf FST.rPi a t ed nuclPio aninc t Q th e hllman n ^~ 0 ortragmpnts 

SdL of thptS" hybnd (RH / T,f PPin9 iS 3 SOmatiC Ce " genetic approach that can be used <™ high resolution 
mapping of the human genome. In this approach, cell lines containing one or more human chromosomes are lethallv 

d L br f eaklng each *romosome into fragments whose size depends on the raS^doW^Se^eSS 
Thk tShn 7 fusi0 " wth c ultured rodent cells, yielding subclones containing different portions of the human genome 
SI iTm b l B f nh ? m 6t f ( Genomics 4 - 5 09"517, 1989) and Coxer al.. (Science 25o" 5 0 1990) 

Human nwa - i » ' ndependent natu / e of the subclones permits efficient mapping of any human genome marker 
Human DNA isolated from a panel of 80-100 cell lines provides a mapping reagent for ordering EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST- llated nucec 

a on^,Hn S f r aPPr0aC ?\- ,he frequency of breakage between markers is used to measure ^stancT aS 
construction of fine resolution maps as has been done using conventional ESTs (Schuler ef al. Science f 274:540-546 

c°hromosom e R UopTS ^ bee " * ^l"** 8 hign - resolution wh °'* genome radiation hybrid map of human 
GeZnfcs 33- afiof^QjfT tne . genes for growth hormone (GH) and thymidine kinase (TK) (Foster et al.. 
1^ 4 2^2 245 igQR? R ) n ^ e re9 ' 0n surr ° Und ' ng the Gor,in syndrome gene (Obermayr ef al. Eur. J. Hum. 
Cton^Sww^LvJ?" Cover,ng the en */ e s h°rt arm of chromosome 12 (Raeymaekers ef al.. 
(rXeTe^ 22 co **™9 the neurofibromatosis type 2 locus 

G!Z£^S^wy ] 3nd 13 ,0C ' ° n l0n9 3rm ° f chrom ° s °™ 5 (Warrington ef al., 

EXAMPLE 42 

SlE 0f .r F c?r t ^ i . n " r ' PiC aCidS l pos|tiona| sp n me nts o f FS T-r ela ted nuclPir aHri« or fraomente nf P n c| tin na | 
segments of FST-related niiclpTc acids to Hnman Chmmn 5 nm« ngin p pcr techni niips PPMWna i 

seQmintsofllTTSn" 01 , 6 ' 0 ^ Se9meniS ° f EST - felated "^leic acids or fragments of positional 

segments of EST-related nucleic acids may be assigned to human chromosomes using PCR based methodolooies i n 

1st r aP ^H a n he , S ' oligo ! lucieotide primer P*™ are designed from EST-related nucleic aciSs positiona of 
E ^In^ 1 C '!! C aC,dS .° r fra 9 ments of Phonal segments of EST-related nucleic acids to minimi he chance o 
amphfying through an mtron. Preferably, the oligonucleotide primers are 18-23 bp in length and are designed fo 
PCR amplification. The creation of PCR primers from known sequences is well known tc .those ^with ^skillm ?he art 
For a revew of PCR technology see Eriich. in PCR Technology; Princip.es and Appfcto^r^^J^ 
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1992. W.H. Freeman and Co., New York. 

[0387] The primers are used in polymerase chain reactions (PCR) to amplify templates from total human genomic 
DNA. PCR conditions are as follows: 60 ng of genomic DNA is used as a template for PCR with 80 ng of each 
oligonucleotide primer, 0.6 unit of Taq polymerase, and 1 pCu of a 32P-labeled deoxycytidine triphosphate The PCR 
is performed in a microplate thermocycler (Techne) under the following conditions: 30 cycles of 94°C, 1.4 min- 55°C 
2 min; and 72°C, 2 min; with a final extension at 72°C for 10 min. The amplified products are analyzed on a 6% 
polyacrylamide sequencing gel and visualized by autoradiography. If the length of the resulting PCR product is 
identical to the distance between the ends of the primer sequences in the 5'EST from which the primers are derived 
then the PCR reaction is repeated with DNA templates from two panels of human-rodent somatic cell hybrids BIOS 
PCRable DNA (BIOS Corporation) and NIGMS Human-Rodent Somatic Cell Hybrid Mapping Panel Number 1 (NIG MS 
Camden, NJ). 

[0388] PCR is used to screen a series of somatic cell hybrid cell lines containing defined sets of human 
chromosomes for the presence of a given 5'EST. DNA is isolated from the somatic hybrids and used as starting 
templates for PCR reactions using the primer pairs from the EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-related nucleic acids. Only those somatic cell 
hybrids with chromosomes containing the human gene corresponding to the EST-related nucleic acids positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids will yield an 
amplified fragment. The 5'ESTs are assigned to a chromosome by analysis of the segregation pattern of PCR products 
from the somatic hybrid DNA templates. The single human chromosome present in all cell hybrids that give rise to an 
amplified fragment is the chromosome containing that EST-related nucleic acids, positional segments of EST-related 
— --■ — ~. ..-v,...-,.^ K ^,„wi,«. -acyuiciiio wi cio i -icidieu nuuitfiu acias. ror a review or tecnniques and 
analysis of results from somatic cell gene mapping experiments. (See Ledbetter et al. t Genomics 6:475-481 (1990)). 
[0389] Alternatively, the EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids may be mapped to individual chromosomes usinq FISH 
as described in Example 43 below. 

EXAMPLE 43 



Mapping of EST-related nucleic aci ds, position al segm e nts of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to Chromosomes Using 

Fluorescence In Situ Hybr idization 

[0390] Fluorescence in situ hybridization allows the EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-related nucleic acids to be mapped to a particular 
location on a given chromosome. The chromosomes to be used for fluorescence in situ hybridization techniques may be 
obtained from a variety of sources including cell cultures, tissues, or whole blood. 

[0391] In a preferred embodiment, chromosomal localization of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids are obtained by FISH as 
described by Cherifera/. (Proc. Natl. Acad. Sci. U.S.A., 87:6639-6643, 1990). Metaphase chromosomes are prepared 
from phytohemagglutinin (PHA)-stimulated blood cell donors. PHA-stimulated lymphocytes from healthy males are 
cultured for 72 h in RPMI-1640 medium. For synchronization, methotrexate (10 pM) is added for 17 h followed by 
addition of 5-bromodeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (1 pg/ml) is added for the last 15 min before 
harvesting the cells. Cells are collected, washed in RPMI, incubated with a hypotonic solution of KCI (75 mM) at 
37°C for 15 min and fixed in three changes of methanol:acetic acid (3:1). The cell suspension is dropped onto a 
glass slide and air dried. The EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids is labeled with biotin-16 dUTP by nick translation 
according to the manufacturer's instructions (Bethesda Research Laboratories, Bethesda, MD), purified using a 
Sephadex G-50 column (Pharmacia, Upsala, Sweden) and precipitated. Just prior to hybridization ' the DNA pellet is 
?f, S °' V !?J n h y bnd,zation buffe ' (50% formamide, 2 X SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon sperm 
DNA, pH 7) and the probe is denatured at 70°C for 5-10 min. 

[0392] Slides kept at -20°C are treated for 1 h at 37°C with RNase A (100 pg/ml) r rinsed three times in 2 X SSC 
and dehydrated in an ethanol series. Chromosome preparations are denatured in 70% formamide, 2 X SSC for 2 min at 
70°C,tten dehydrated at 4X. The slides are treated with proteinase K (10 pg/100 ml in 20 mM Tris-HCI, 2 mM CaCI 2 ) 
at 37°C for 8 min and dehydrated. The hybridization mixture containing the probe is placed on the slide covered 
with a coverslip, sealed with rubber cement and incubated overnight in a humid chamber at 37°C After hybridization 
and post-hybridization washes, the biotinylated probe is detected by avidin-FITC and amplified with additional 
layers of biotinylated goat anti-avidin and avidin-FITC. For chromosomal localization, fluorescent R-bands are 
obtained as previously described (Cherif et a/., supra.). The slides are observed under a LEICA fluorescence 
microscope (DMRXA). Chromosomes are counterstained with prop.dium iodide and the fluorescent signal of the probe 
appears as two symmetrical yellow-green spots on both chromatids or the fluorescent R-band chromosome (red) Thus 
a particular EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids may be localized to a particular cytogenetic R-band on a given chromosome 
Once the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids have been assigned to particular chromosomes using the techniques described 
in Examples 41-43 above, they may be utilized to construct a high resolution map of the chromosomes on which they 
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are located or to identify the chromosomes in a sample. 

EXAMPLE 44 

Use of EST-related nucleic acids, positional segment s of EST-related nucleic acids or fragments of positional 
s egments of EST-rel ated nucle ic acids t o Construct or Fxpand Chromosome Maps ■ 

[0393] Chromosome mapping involves assigning a given unique sequence to a particular chromosome as 
described above. Once the unique sequence has been mapped to a given chromosome, it is ordered relative to other 
U Hr Ue , S i qUenceS Cated 00 the same cnromos °me. One approach to chromosome mapping utilizes a series of yeast 
artificial chromosomes (YACs) bearing several thousand long inserts derived from the chromosomes of the organism 
from which the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids are obtained. This approach is described in Ramaiah Nagaraia ef al 
I ^°" 2 > 22 ;, M S r ^ 1 " 7 - BriefIy ' in this a PP roach each chromosome is broken into overlapping 

Shlf^h are ."!f e l ed r!°T thf ! \ A ? veCtor The YAC inserts are screened usina PCR or other methods to determine 
whether hey include the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
phonal segments of EST-related nucleic acids whose position is to be determined. Once an insert has been found 
which includes the 5 EST, the insert can be analyzed by PCR or other methods to determine whether the insert also 
contains other sequences known to be on the chromosome or in the region from which the EST-related nucleic acids 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids' 
was derived. This process can be repeated for each insert in thp yap. lihtarv t« wotormir.^ tho i,o 3 «„. -« —u _« 
th f e A% ? ' a, ! d nU . deic acids ' P° sitional segments of EST-related nucleic acids or fragments "of p^sitionarsegmente 
of EST-related nucleic acids relative to one another and to other known chromosomal markers In this way a hiqh 
obtained" dlstribution of numerous unique markers along each of the organisms chromosomes 'may be 

[0394] As described in Example 45 below EST-related nucleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-related nucleic acids may also be used to identify genes associated 
with a particular phenotype, such as hereditary disease or drug response. 

3. Use of EST-related nucleic acids, positional segments of EST-related nucleic acids or fr agments of positional 
segments of EST-relate d nucleic acids Gene Identification 

EXAMPLE 45 

Identification of genes associated with hereditary diseases or drug response 

[0395] This example illustrates an approach useful for the association of EST-related nucleic acids positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 'acids with 
partcular phenotypic characteristics. In this example, a particular EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids is used as a test 
probe to associate that EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids with a particular phenotypic characteristic. 

[0396] EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are mapped to a particular location on a human chromosome using techniques 

^il?° S L S fv f?^ E f n I ple l 4 , 1 and 42 or other ,ech(1iau es known in the art. A search of Mendelian 
« f?M J ^, a u (V \ McKusick ' Mendehan Inheritance in Man (available on line through Johns Hopkins University 
Welch Medical Library) reveals the region of the human chromosome which contains the EST-related nucleic acids 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids to 
be a very gene rich region containing several known genes and several diseases or phenotypes for which genes have 
not been identified. The gene corresponding to this EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related nucleic acids thus becomes an immediate candidate 
for each of these genetic diseases. 

[0397] Cells from patients with these diseases or phenotypes are isolated and expanded in culture PCR primers 
from the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are used to screen genomic DNA, mRNA or cDNA obtained from the patients 
IIt r!. ^ , e ' C 30 P° slbonal segments of EST-related nucleic acids or fragments of positional segments of 
EST-related nucleic acids that are not amplified in the patients can be positively associated with a particular 
V w 6r ana| y sis - Alternatively, the PCR analysis may yield fragments of different lengths when the 
S2i T ll° m a " Ind,v,dual havina the Phenotype associated with the disease than when the sample is 

d fnml t ! 0n ? " i 63 ^ 'n^ual, indicating that the gene containing the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids may be 
responsible for the genetic disease. y 

VI. Use of EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids to Construct Vectors 
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[0398] The present EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may also be used to construct secretion vectors capable of 
directing the secretion of the proteins encoded by genes therein. Such secretion vectors may facilitate the 
purification or enrichment of the proteins encoded by genes inserted therein by reducing the number of background 
proteins from which the desired protein must be purified or enriched. Exemplary secretion vectors are described in 
example 46 below. 

1. Construction of secretion vectors 

EXAMPLE 46 

Construction of Secretion Vectors 

!u 39 u 1 * ,7 he secretion vectors of the P resent invention include a promoter capable of directing gene expression in 
the host cell, tissue, or organism of interest Such promoters include the Rous Sarcoma Virus promoter the SV40 
promoter, the human cytomegalovirus promoter, and other promoters familiar to those skilled in the art 
[0400] A signal sequence from one of the EST-related nucleic acids, positional segments of EST-related nucleic 
fu° ^ r J^9 ments .^f positional segments of EST-related nucleic acids is operably linked to the promoter such that 
the mRNA transcribed from the promoter will direct the translation of the signal peptide Preferably the signal 
sequence is from one of the nucleic acids of SEQ ID NOs.:24-4100. The host cell, tissue, or organism may be any cell 

■a ' "i" —~ w. a .,«, K « K uwo cn^wucu uy mo s»iyneii sequence in ine £5>i-reiaiea nucleic 

acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids. Suitable hosts include mammalian cells, tissues or organisms, avian cells, tissues or organisms insect 
cells, tissues or organisms, or yeast. 

[0401] in addition, the secretion vector contains cloning sites for inserting genes encoding the proteins which 
are to be secreted. The cloning sites facilitate the cloning of the insert gene in frame with the signal sequence 
such that a fusion protein in which the signal peptide is fused to the protein encoded by the inserted gene is expressed 
from the mRNA transcribed from the promoter The signal peptide directs the extracellular secretion of the fusion 
protein. 

[0402] The secretion vector may be DNA or RNA and may integrate into the chromosome of the host be stably 
maintained as an extrachromosomal replicon in the host, be an artificial chromosome, or be transiently present in 
the host. Preferably, the secretion vector is maintained in multiple copies in each host cell As used herein 
mu tip e copies means at least 2, 5, 10, 20, 25, 50 or more than 50 copies per cell. In some' embodiments the 
multiple copies are maintained extrachromosomally. In other embodiments, the multiple copies result from 
amplification of a chromosomal sequence. 

[0403] Many nucleic acid backbones suitable for use as secretion vectors are known to those skilled in the art 
including retroviral vectors, SV40 vectors, Bovine Papilloma Virus vectors, yeast integrating plasmids yeast 
episomal plasmids, yeast artificial chromosomes, human artificisl chromosomes, P element vectors baculovirus 
vectors, or bacterial plasmids capable of being transiently introduced into the host. 

[0404] The secretion vector may also contain a polyA signal such that the polyA signal is located downstream of 
the gene inserted into the secretion vector. 

[0405] After the gene encoding the protein for which secretion is desired is inserted into the secretion vector 
nlAlfn^ 0 " ' S introduced into tne host ce!| . tissue, or organism using calcium phosphate precipitation' 

DEAE-Dextran electroporation, hposome-mediated transfection, viral particles or as naked DNA. The protein encoded 
by the inserted gene is then purified or enriched from the supernatant using conventional techniques such as 
ammonium sulfate precipitation, immunoprecipitation, immunoaffinitychromatography, size exclusion chromatography 
ion exchange chromatography, and HPLC. Alternatively, the secreted protein may be in a sufficiently enriched or pure 

^erenriiment™ 8 * 3 " 1 °* ° f ^ ^ t0 * t0 b6 USed ' f ° r te intended P ur P 0S * without 

[0406] The signal sequences may also be inserted into vectors designed for gene therapy In such vectors the 
signal sequence is operably linked to a promoter such that mRNA transcribed from the promoter encodes the signal 
peptide. A cloning site is located downstream of the signal sequence such that a gene encoding a protein whose 
secretion is desired may readily be inserted into the vector and fused to the signal sequence The vector is 
introduced into an appropriate host cell. The protein expressed from the promoter is secreted extracellulariy 
thereby producing a therapeutic effect. 

EXAMPLE 47 

Fusion Vecto rs 

[0407] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may be used to construct fusion vectors for the expression of 
chimeric polypeptides. The chimeric polypeptides comprise a first polypeptide portion and a second oolypeptide 
portion, in the rusion vectors of the present invention, nucleic acids encoding the first polypeptide portion and 
the second polypeptide portion are joined in frame with one another so as to generate a nucleic acid encoding the 
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chimeric polypeptide. The nucleic acid encoding the chimeric polypeptide is operably linked to a oromoter whirh 

S S ST ° f 3n , f N i enCOding the Chimeric Polypeptide. The promoter may be in Iny ^of tne expre^S 
vectors descnbed heran including those described in Examples 20 and 46 expression 

thf mnitini/^l 3 '" 7, the fUS . i0n !f Ct ° r iS maintained in mutiple copies in each host cell. In some embodiments 
XffioHX^ emb0dim ^ *• mu«p,e copies result 

[0409] The first polypeptide portion may comprise any of the polypeptides encoded by the EST-related nucleic 
Ss .n°™ "FT* ,° f ^ ST ; rel3ted nuc,eic acids or fra 9 ments of positional segmental of EST- ela ed nucSc 
^friTr^^ 6 "* l he fi , rSt P 0| yP e P tide P ortl '°" -"ay be one of the EST-related polypeptides' fSgmente of 
lIKIaS {S55S& P ° SI,,0nal Se9mentS ° f EST - re ' ated P0 ' yPeptideS ' ° f fragments 'of^onai SSJ 0 °! 

setond D0 l^n«Hrn n . d rt PO ' yPeptide may COmprise any P 0| yPepMe of interest. In some embodiments the 

which facilitate the express.on of the chimeric polypeptide. Where appropriate the cells are treated with a 
detection reagent which is visible under the microscope following a catalytic reaction with' the LSlbll 

cause the green fluorescent protein or modified version thereof to fluoresce appropriate wavelength to 

[0411 J Alternatively, the second polypeptide portion may comprise a • polypeptide whose isolation ourifieation 

POrto n rm m av en be% C hS 5 ""S embod ! mente - the is ° la «°n. Potion, oV e^menT^^Sd JolypepS 
2 m **.. be achieved by performing the immunoaffinity chromatography procedures described below usino an 
mmunoaffinity column having an antibody directed against the first polypeptide portion coupledlhereto 9 
[04 2] The protons encoded by the EST-related nucleic acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nucleic acids or the EST-related polypeptides TaqmenteTf EST 
ea ed polypeptides, posrtional segments of EST-related polypeptides, or fragment of positional' seaments of Ht 
related polypeptides may also be used to generate antibodies as explaned in Examples 2C land 33 in oX to L»ntL" 
the tissue type or cell species from which a sample is derived as described in Examje 48 * 

EXAMPLE 48 

Identification of Tissue Ty p es or Cell Species bv Means nf I a b eled Tissue S pP dfjc Antibodies 

35 [0413] Identification of specific tissues is accomplished by the visualization of tissue sDecific antioens h v 
ZTJJ ant,b ? dy P' e P arat T S aCC ° rding t0 Exam P ,es 20 and 33 which are connate dfrecfy o r indirS to a 
• S l9 r marker Selected labeled an «body species bind to their specific antigen binding partner in tissue 

tazjx^sjzss? of so,ub,e proteins from a sampie - p-id p e a r P rnT 

40 f 0414 ] Antisera for these procedures must have a potency exceeding that of the native Deration anri fnr th^t 

l^T°T COnC K entrated t0 3 m 9 /ml leVel * isolatio " <* ^e 9 ^nl ^SS^'SiSX 
9 » *™ ma ,tography or by ammonium sulfate fractionation. Also, to provide the . m osT'spedfic antis'era 

hv m «nt 2 ,b0d Tl, l0r eXamp ' e t0 Common proteins ' must be removed from the gamma pS^KaSnfSr examofe 

a — ™ - - - eTSSSK 

1 Immunohistochemical Techniques 

HpIILh / Urified ' , hi9h / tit ! r antibodies ' P repared as des <=*ed above, are conjugated to a detectable marker as 
fSSvJS^^g 16, by ^ udenber 9- Cha P- 26 in- Basic 503 Clinical Immunology. 3-4 Ed Lange Los Altos 
(1980, ( ' 6 " *' a/ " ChaP - 12 in: Me(/,0dS l^odiagnosis, 2d Ed. John Wiley and Sons, Newark 

San enzvmeTh^^nnnT^' J*"" *!, UOreSCein °' rhodamine ' is Purred, but antibodies can also be labeled 
™ f ♦ su PP° rts a color producing reaction with a substrate, such as horseradish peroxidase Markers 

IrtHrZl I tissue-bound antibody in a second step, as described below. Man^ ih^^^aim^ 

emSon ' ' P ' * ™* ****** by ° Ver,aying the antibod * treated V*P™«™ with photographic 

[0417] Preparations to carry out the procedures can comprise monoclonal or pohyclonal antibodies to a single 
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«I?,l e ,T. ° r » Pepbd !, l t e f ,ed as s P ecific t0 a fesue type, for example, brain tissue, or antibody preparations to 
several antigemcally distinct tissue specific antigens can be used in panels, independently or in mSSSs.^SSlJS 

rnmmnn h " . ft Ce " suspensions ™ spared foi immunohistochemical examination according to 

common histological techn.ques. Multiple cryostat sections (about 4 urn, unfixed) of the unknown tissue and known 
control are mounted and each slide covered with different dilutions of the antibody preparation Section^ of knol 
and unknown tissues should also be treated with preparations to provide a positive control a negative control for 
example, pre-.mmune sera, and a control for non-specific staining, for example buffer 9 ' 

SJUfa, ^aT^ S f ,i0nS are J nc " ba,ed in a humid chamber for 30 min at room temperature, rinsed, then washed in 
buffer for 30-45 mm. Excess fluid is blotted away, and the marker developed 

in ™„„h ' f ?k ^ SSUe , l SP J eCifiC antib ° dy W3S n0t labeled in the first intubation, it can be labeled at this time 
L imZ2Sin < V n,,b ?^ f0f 6Xamp,e ' bV 3ddin9 fluoresce '-n- °f enzyme-conjugated antibody agains* 

ffl*ssr to9 species ' for exampie - fluore5cdn ,abeied antib ° dy * — '^ G - 

Star orfh,or™? iflen il! ,U I? d *" * ^ P ro(?edure can be quantified by measuring the intensity of 

color or fluorescence on the tissue section, and calibrating that signal using appropriate standards. 

2. Identification of Tissue Specific Soluble Proteins 

[0422] The visualization of tissue specific proteins and identification of unknown tissues from that procedure 

l.^ n . ed . ° ut ., USm . 9 the labe,ed antibody rea 9 ents and *tocfcn strategy as described for immu^leST 
..»».»« <u.« sampie is prepared according to an electrophoretic technique to distribute the proteins extracted from 
the tissue in an orderly array on the basis of molecular weight for detection exiractea rrom 

ESLi, * A tiSSUe ' S nomo 9 enized ^ins a Virtis apparatus; cell suspensions are disrupted by Dounce 

homogenization or osmotic lysis, using detergents in either case as required to disrupt cell membranes as is the 
practice ,n the art. Insoluble cell components such as nuclei, microsomes', and memo ane fragments are removed bv 
ultracentnfugaton, and the soluble protein-containing fraction concentrated if necessary and reseS for ana^sis * 
[u A£ 4j a sample of the soluble protein solution is resolved into individual protein species by conventional SDS 
polyacrylamide electrophoresis as described, for example, by Davis L at al Section 19-2* n 7l3ask SodTin 

ZT/esTV^"- ?«\ B **™ h New ™ <"«>■ -in 9 a 'range of' amounts of oo, ac^lam^n a 2t f 
Sr JS r f molecular weight range of proteins to be detected in the sample A size marker is run in 

parallel for purposes of estimating molecular weights of the constituent proteins. Sample size for analysis is a 
convenient volume of from 5 to 55 pi, and containing from about 1 to 100 ug protein An a quot of each of the 
resolved proteins ,s transferred by blotting to a nitrocellulose filter paper, ^ process ' that maintains the pattern 

Da sTe7 a / M sJ£ SSUn « ST"*' 7"? T^fT lmwn 38 WestenTeiot Ar^TSTdSlnSS 

thTLL J* * f ? 6 S6t ° f nrtrocellulose blots is stained with Coomassie Blue dye to visualize 

ItX^v^T^ul^^" W ' th an < tib0dy b ° Und Pr0teinS ' The remainin 9 nitrocellulose filters are 
ExTmnT P ?^nr^ a „ tn , " e ° r m ° re Spec,fiC antlsera t0 tissue s P ecific P roteins P^pared as described in 

consols are run P«*edure. as ,n procedure A above, appropriate positive and negative sample and reagent 

nrS, a „tiK n H either p ' ocedure des cribed above a detectable label can be attached to the primary tissue antigen- 
t P h7nLl^ d L C fi OmP '^ a w C0rdin9 K tO Vari0US Strategies and P^mutations thereof. In a straightforward approacS 
the pnmary specific antibody can be labeled; alternatively, the unlabeled complex can be bound bv a labeled 

mo?ec^.7wS 9 ? a antlb0dy ' '2 ° th6r f Pr ° aCh6S ' 6ither the primary or secondary'antibody is cSgated" to a IE 
molecule which can, in a subsequent step, bind an avidin conjugated marker. According to yet another strateav 

EXAMPLE 49 

Immunohi stochemical Localization of Polype ptides 

SJ2 , ™ n e antibodies PJjared as described in Examples 20 and 33 above may be utilized to determine the 
cellular location of a polypeptide. The polypeptide may be any of the polypeptides encoded by EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 

SnaTstlS fl^W b ? ° ne .° f the EST - re ' ated P^PeP^es. fragments of Isolated polypeptides 
™1 i , of EST-related polypeptides, or fragments of positional segments of EST-related polypeptides In 
Ixampte 47 S ' *' polypep,,de m ** be a chim *ric polypeptide such as those encoded by the fusiJn vectors of 

thfo 7 L B H,,? o e J ,S h eXPr ,f SSin9 , the P 0| yP e P tide t0 be Realized are applied to a microscope slide and fixed using any of 
the procedures typically employed in immunohistochemical localization techniques including the methods described 

^^ f ^T m Jl!1r^ Biotoay - J0h " Wiley and Sons ' ,nc - 1997 ™°^9 ' Ashing Tstt L celfi 
contacted with the antibody. In some embodiments, the antibody is conjugated to a detectabte marker' as described 
!nfh d t facilitate detection. Alternatively, in some embodiments, after the cells have been contacted S an 
antibody to the po ypept.de to be localized, a secondary antibody which has been conjugated I to a S table marker 
is placed in contact with the antibody against the polypeptide to be localized aeiectaoie marker 
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[0428] Thereafter, microscopy is performed under conditions suitable for visualizing the cellular location of 
the polypeptide. 

[0429] The visualization of tissue specific antigen binding at levels above those seen in control tissues to one 
or more tissue specific antibodies, directed against the polypeptides encoded by EST-related nucleic acids 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids or 
antibodies against the EST-related polypeptides, fragments of EST-related polypeptides, positional segments of EST- 
related polypeptides, or fragments of positional segments of EST-related polypeptides, can identify tissues of 
unknown origin, for example, forensic samples, or differentiated tumor tissue that has metastasized to foreiqn 
- bodily sites. 

[0430] The antibodies of Example 20 and 33 may also be used in the immunoaffinity chromatography techniques 
described below to isolate, purify or enrich the polypeptides encoded by the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids or to isolate 
purify or enrich EST-related polypeptides, fragments of EST-related polypeptides, positional segments of EST-related 
polypeptides, or fragments of positional segments of EST-related polypeptides. The immunoaffinity chromatography 
techniques described below may also be used to isolate, purify or enrich polypeptides which have been linked to the 
polypeptides encoded by the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids or to isolate, purify or enrich polypeptides which have been 
linked to EST-related polypeptides, fragments of EST-related polypeptides, positional segments of EST-related 
polypeptides, or fragments of positional segments of EST-related polypeptides. 

CVA MOI c en 
L.nr\ni i L_L_ 

Immunoa ffinity Chromatography 

[0431] Antibodies prepared as described above are coupled to a support. Preferably, the antibodies are 
monoclonal antibodies, but polyclonal antibodies may also be used. The support may be any of those typically 
employed in immunoaffinity chromatography, including Sepharose CL-4B (Pharmacia, Piscataway NJ) Sepharose CL- 
2B (Pharmacia, Piscataway, NJ), Affi-gel 10 (Biorad, Richmond, CA), or glass beads. 

[0432] The antibodies may be coupled to the support using any of the coupling reagents typically used in 
immunoaffinity chromatography, including cyanogen bromide. After coupling the antibody to the support the support 
is contacted with a sample which contains a target polypeptide whose isolation, purification or enrichment is 
desired. The target polypeptide may be a polypeptide encoded by the EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids or the target 
polypeptide may be one of the EST-related polypeptides, fragments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional segments of EST-related polypeptides. The target 
polypeptides may also be polypeptides which have been linked to the polypeptides encoded by the EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids or the target polypeptides may be polypeptides which have been linked to EST-related polypeptides, fragments 
of EST-related polypeptides, positional segments of EST-related polypeptides, or fragments of positional segments of 
EST-related polypeptides using the fusion vectors described above. 

[0433] Preferably, the sample is placed in contact with the support for a sufficient amount of time and under 
appropriate conditions to allow at least 50% of the target polypeptide to specifically bind to the antibody coupled 
to the support. 

[0434] Thereafter, the support is washed with an appropriate wash solution to remove polypeptides which have 
non-specifically adhered to the support. The wash solution may be any of those typically employed in immunoaffinity 
chromatography, including PBS, Tris-lithium chloride buffer (0.1 M lysine base and 0.5M lithium chloride pH 8 0) 
Tns-hydrochloride buffer (0.05M Tris-hydrochloride, pH 8.0), or Tris/Triton/NaCI buffer (50mM Tris cl pH 80 or 
9.0, 0.1% Triton X-100, and O.SMNaCI). ' 
[0435] After washing, the specifically bound target polypeptide is eluted from the support using the high pH or 
low pH elution solutions typically employed in immunoaffinity chromatography. In particular, the elution solutions 
may contain an eluant such as triethanolamine, diethylamine, calcium chloride, sodium thiocyanate, potasssium 
bromide, acetic acid, or glycine. In some embodiments, the elution solution may also contain a detergent such as 
Triton X-1 00 or octyl-p-D-glucoside. 

[0436] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may also be used to clone sequences located upstream of the 5'ESTs 
which are capable of regulating gene expression, including promoter sequences, enhancer sequences, and other 
upstream sequences which influence transcription or translation levels. Once identified and cloned, these upstream 
regulatory sequences may be used in expression vectors designed to direct the expression of an inserted gene in a 
desired spatial, temporal, developmental, or quantitative fashion. Example 51 describes a method for cloning 
sequences upstream of the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids. 

2 . Identification Of UPStream sequences with p ro moting or regulatory activities 
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Use of EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to Clone Upstream Sequences from Genomic DNA 

[0437] Sequences derived from EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids may be used to isolate the promoters of the 
corresponding genes using chromosome walking techniques. In one chromosome walking technique, which utilizes the 
GenomeWalker™ kit available from Clontech, five complete genomic DNA samples are each digested with a different 
restriction enzyme which has a 6 base recognition site and leaves a blunt end. Following digestion, oligonucleotide 
adapters are ligated to each end of the resulting genomic DNA fragments. 

[0438] For each of the five genomic DNA libraries, a first PGR reaction is performed according to the 
manufacturer's instructions using an outer adapter primer provided in the kit and an outer gene specific primer. The 
gene specific primer should be selected to be specific for 5' EST of interest and should have a melting temperature, 
length, and location in the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids which is consistent with its use in PCR reactions. Each first 
PCR reaction contains 5ng of genomic DNA, 5 pi of 10X Tth reaction buffer, 0.2 mM of each dNTP, 0.2 pM each of 
outer adapter primer and outer gene specific primer, 1.1 mM of Mg(OAc) 2 , and 1 pi of the Tth polymerase SOX mix in 
a total volume of 50 pi. The reaction cycle for the first PCR reaction is as follows: 1 min at 94°C / 2 sec at 94°C 
3 min at 72°C (7 cycles) / 2 sec at 94°C, 3 min at 67°C (32 cycles) / 5 min at 67°C. 

[0439] The product of the first PCR reaction is diluted and used as a template for a second PCR reaction 
according to the manufacturers instructions using a pair of nested primers which are located internally on the 
amplicon resulting from the first PCR reaction. For example, 5 pi of the reaction product of the first PCR reaction 
mixture may be diluted 180 times. Reactions are made in a 50 pi volume having a composition identical to that of the 
first PCR reaction except the nested primers are used. The first nested primer is specific for the adapter, and is 
provided with the GenomeWalker™ kit The second nested primer is specific for the particular EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids for which the promoter is to be cloned and should have a melting temperature, length, and location in the EST- 
related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST- 
related nucleic acids which is consistent with its use in PCR reactions. The reaction parameters of the second PCR 
reaction are as follows: 1 min at 94°C / 2 sec at 94°C, 3 min at 72 5 C (6 cycles) / 2 sec at 94°C; 3 min at 67°C (25 
cycles) / 5 min at - 67'C. The product of the second PCR reaction is purified, cloned, and sequenced using standard 
techniques. 

[0440] Alternatively, two or more human genomic DNA libraries can be constructed by using two or more 
restriction enzymes. The digested genomic DNA is cloned into vectors which can be converted into single stranded, 
circular, or linear DNA. A biotinylated oligonucleotide comprising at least 15 nucleotides from the EST-related 
nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids sequence is hybridized to the single stranded DNA. Hybrids between the biotinylated oligonucleotide 
and the single stranded DNA containing the EST-related nucleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-related nucleic acids are isolated as described above. Thereafter, 
the single stranded DNA containing the EST-related nucleic acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nucleic acids is released from the beads and converted into 
double stranded DNA using a primer specific for the EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related nucleic acids or a primer corresponding to a 
sequence included in the cloning vector. The resulting double stranded DNA is transformed into bacteria. cDNAs 
containing the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids are identified by colony PCR or colony hybridization. 
[0441] Once the upstream genomic sequences have been cloned and sequenced as described above, prospective 
promoters and transcription start sites within the upstream sequences may be identified by comparing the sequences 
upstream of the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids with databases containing known transcription start sites, 
transcription factor binding sites, or promoter sequences. 

[0442] In addition, promoters in the upstream sequences may be identified using promoter reporter vectors as 
described in Example 53. 

EXAMPLE 53 

Identification of Promoters in Cloned Upstream Sequences 

[0443] The genomic sequences upstream of the EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related r.ucleic acids are cloned into a suitable promoter 
reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, pPgal-Basic, pf3gal-Enhancer, or pEGFP-1 Promoter 
Reporter vectors available from Clontech. Briefly, each of these promoter reporter vectors include multiple cloning 
sites positioned upstream of a reporter gene encoding a readily assayable protein such as secreted alkaline 
phosphatase, p gaiactosidase, or green fluorescent protein. The sequences upstream of the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids 
are inserted into the cloning sites upstream of the reporter gene in both orientations and introduced into an 
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appropriate host cell The level of reporter protein is assayed and compared to the level obtained from a vector 
which lacks an insert in the cloning site. The presence of an elevated expression level in the vector containing the 
insert with respect to the control vector indicates the presence of a promoter in the insert. If necessary the 
upstream sequences can be cloned into vectors which contain an enhancer for augmenting transcription levels'from 
. weak promoter sequences. A significant level of expression above that observed with the vector lacking an insert 
indicates that a promoter sequence is present in the inserted upstream sequence. 

[0444] Appropriate host cells for the promoter reporter vectors may be chosen based on the results of the above 
described determination of expression patterns of the EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related nucleic acids. For example if the expression 
pattern analysis indicates that the mRNA corresponding to a particular EST-related nucleic acids positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids is expressed 
in fibroblasts, the promoter reporter vector may be introduced into a human fibroblast cell line. 

[0445] Promoter sequences within the upstream genomic DNA may be further defined by constructing nested 
deletions in the upstream DNA using conventional techniques such as Exonuclease III digestion The resulting 
deletion fragments can be inserted into the promoter reporter vector to determine whether the deletion has reduced 
or obliterated promoter activity. In this way, the boundaries of the promoters may be defined If desired potential 
individual regulatory sites within the promoter may be identified using site directed mutagenesis or linker" scanning 
to obliterate potential transcription factor binding sites within the promoter individually or in combination The 
effects of these mutations on transcription levels may be determined by inserting the mutations into the cloninq 
sites in the promoter reporter vectors. 



EXAMPLE 54 



Cloning and Identification of Promoters 

[0446] Using the method described in Example 51 above with 5' ESTs, sequences upstream of several genes were 
obtained. Using the primer pairs GGG AAG ATG GAG ATA GTA TTG CCT G (SEQ ID NO: 15) and CTG CCA TGT ACA 
TGA TAG AGA GAT TC (SEQ ID NO: 16), the promoter having the internal designation P13H2 (SEQ ID NO' 17) was 
obtained. 

[0447] Using the primer pairs GTA CCA GGGG ACT GTG ACC ATT GC (SEQ ID NO: 18) and CTG TGA CCA TTG 
CTC CCA AGA GAG (SEQ ID NO:19), the promoter having the internal designation P15B4 fSEQ ID NO 20) was 
obtained. ' ' 

[0448] Using the primer pairs CTG GGA TGG AAG GCA CGG TA (SEQ ID NO:21 ) and GAG ACC ACA CAG CTA 
GAC AA (SEQ ID NO:22), the promoter having the internal designation P29B6 (SEQ ID NO:23) was obtained. 
[0449] Figure 4 provides a schematic description of the promoters isolated and the way they are assembled with 
the corresponding 5' tags. The upstream sequences were screened for the presence of motifs resembling transcription 
factor binding sites or known transcription start sites using the computer program Matlnspector release 2.0, August 
1996. 

[0450] Figure 5 describes the transcription factor binding sites present in each of these promoters The columns 
labeled matrice provides the name of the Matlnspector matrix used. The column labeled position provides the 5' 
position of. the promoter site. Numeration of the sequence starts from the transcription site as determined by 
matching the genomic sequence with the 5' EST sequence. The column labeled "orientation" indicates the DNA strand 
on which the site is found, with the + strand being the coding strand as determined by matching the genomic sequence 
with the sequence of the 5* EST. The column labeled "score" provides the Matlnspector score found for this site The 
column labeled "length" provides the length of the site in nucleotides. The column labeled "sequence" provides the 
sequence of the site found. 

[0451J Bacterial clones containing plasmids containing the promoter sequences described above described above 
are presently stored in the inventor's laboratories under the internal identification numbers provided above The 
inserts may be recovered from the deposited materials by growing an aliquot of the appropriate bacterial clone in 
the appropnate medium. The plasmid DNA can then be isolated using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large scale alkaline lysis plasmid isolation procedures If 
desired the plasmid DNA may be further enriched by centrifugation on a cesium chloride gradient size exclusion 
chromatography, or anion exchange chromatography. The plasmid DNA obtained using these procedures may then be 
manipulated using standard cloning techniques familiar to those skilled in the art. Alternatively a PCR can be done 
with primers designed at both ends of the inserted EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related nucleic acids. The PCR product which corresponds to 
fcox , nUCle, ° aClds ' P° sitional segments of EST-related nucleic acids or fragments of positional segments 
of EST-related nucleic acids can then be manipulated using standard cloning techniques familiar to those skilled in 
the art. 

[0452] The promoters and other regulatory sequences located upstream of the EST-related nucleic acids 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids 
may be used to design expression vectors capable of directing the expression of an inserted gene in a desired 
spatial, temporal, developmental, or quantitative manner. A promoter capable of directing the desired spatial 
temporal, developmental, and quantitative patterns may be selected using the results of the expression analysis 
described above. For example, if a promoter which confers a high level of expression in muscle is desired the 
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promoter sequence upstream of EST-related nucleic acids, pos.tional segments of EST-related nucleic acids or 

SelTmu S °L P ° 9to H n f ' S69m f EST : re ' ated " UCleiC 3cidS defl " ed from an mRNA wnic " are expSd at a high 
level in muscle, as determined by the methods above, may be used in the expression vector. 

♦J! 45 ? 1 a Pre ' er ^ b| y' tne desired Promoter is placed near multiple restriction sites to facilitate the cloning of 
o.np Th- downstrea K m of the P romoter . such that the promoter is able to drive expression of the inserted 

gene. The promoter may be inserted in conventional nucleic acid backbones designed for extrachromosomal 
replication, .ntegration ,nto the host chromosomes or transient expression. Suitable backbones for KHresen 

P^3 a n ^^Lh* d % r ^"^^ ne8 ' backb0neS fr0m eukar y° tic e P isomes sucn ■» S?40 orBovfne 
Papilloma Virus, backbones from bactenal episomes, or artificial chromosomes 

S^1Li ro ^n f !K ably, , th ! exp ' ession va ? tors also include a POVA signal downstream of the multiple restriction 
sites for directing the polyadenylation of mRNA transcribed from the gene inserted into the expression vector 

hZT\ * W ' ng ^ e ldentl ^ ation of Promoter sequences using the procedures of Examples 51-54, proteins which 
interact with the promoter may be identified as described in Example 55 below. 

EXAMPLE 55 

Identification of Proteins Which Interact with Promoter Seouen^s . Upstream Rpn ubtorv Sequences, or mRNA 

[0456] Sequences within the promoter region which are likely to bind transcription factors may be identified by 
9y , t0 _ k " 0Wn inscription ^^or binding sites or through conventional mutagenesis or deletion analyses of 
ronton nn1h" iuS C0l ? iainin 9 ine P'omoier sequence. For example, deletions may be made in a reporter plasmid 
containing the promoter sequence of interest operably linked to an assayable reporter gene. The reporter plasmids 
ITZVJZZ* de ' etl0nS , Pr ° m0ter re9i0n are Erected an appropriate host eel and the effects 

?h expresslon leve ' S ,s assessed Transcription factor binding sites within the regions in which 

tnTl t C \ ex P reSSIC 1 le r e ' S may be further luCalized usin 9 site directed mutagenesis, tinker scanning 
analysis, or other techniques familiar to those skilled in the art. inning 

[0457] Nucleoids encoding proteins which interact with sequences in the promoter may be identified using one- 
hybrid systems such as those described in the manual accompanying the Matchmaker One-Hybrid System kit available ' 
from Clontech (Catalog No. K1603-1). Briefly, the Matchmaker One-hybrid system is used as follows The X 
fnZTtoH I t' Ch * '! d6Slred t0 o identi,y binding proteins is cloned u P stream ° f a electable reporter gene and 
DlLsS t Snd P m ye A a 1h gen ° me - Pref r b , ly ; mUltiple C ° pieS ° f the taf 9 et fences are inserted into the 9 reporter 
S I ' * r J 3ry „ com P nsed of fusions ^tween cDNAs to be evaluated for the ability to bind to the 
promoter and the activation domain of a yeast transcription factor, such as GAL4, is transformed into the yeast 
strain contammg the integrated reporter sequence. The yeast are plated on selective media to seect cells 
expressing the selectable marker linked to the promoter sequence. The colonies which grow on the selective media 
Zten^TJ^T 9 P rotein A which bind th * target sequence. The inserts in the genes encoding he fusTon 
protens are further characterized by sequencing. In addition, the inserts may be inserted into expression vectors or™ 
vitro transenpbon vectors. Binding of the polypeptides encoded by the inserts to the promoter DNA may be confirmed 
by techniques familiar to those skilled in the art, such as gel shift analysis or DNAse protection analysis 

VII. Use of EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids In Gene Therapy 

[0458] The present invention also comprises the use of EST-related nucleic acids, positional segments of EST- 
refcted nucleic acids or fragments of positional segments of EST-related nucleic acids in gene therapy strategies 
™?rii " 9 antsense and *"P'e helix strategies as described in Examples 56 and 57 below. In antisense approaches 
Z -l™ sequences complementary to an mRNA are hybrids to the mRNA intracellular^, thereby blocking the 

£K m^^*" 60 ** , by 2 e m ? NA - The antisense sec ' uences ™y P revent gene expression through a 
mRMA lr„t Th For t eXample ' the ant,sense ^q^nces may inhibit the ability of ribosomes to translate the 
th^hvifi ,k V ' he que , nces ma * Dlock transport of the mRN A from the nucleus to the cytoplasm 

mav Z in» ° Unt ° f mRNAava l ,ab| e for translation. Another mechanism through which antisense sequences 
may inhibrt gene expression is by .nterfenng with mRNA splicing In yet another strategy the antisense nucleic 
acid may be incorporated in a ribozyme capable of specifically cleaving the target mRNA. 

EXAMPLE 56 

Preparation and Use of Antisense Oligonucleotide^ 

[0459] The antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA sequences 
If Is? rl C £7 nSS ; 3 se q. u ence complementary to the sequence of the EST-related nucleic acids, positional segments 
»rinf h nUCleiC , aC 't ° f f : a 9 mente of positional segments of EST-related nucleic acids. The antisense nuclec 

S uScienf S tabiri a inhihftH and M ° temperatu ' e sufficient t0 Permit formation of an intracellular duplex S 
I If ty " b,t the ex P ress| on of the mRNA in the duplex. Strategies for designing antisense nucleic 

^SZ^IX^^ disclosed in Green et aL Ar,n - Rev - dchem - 55:569 - 597 

[0460] in some strategies, antisense molecules are obtained from a nucleotide sequence encoding a protein by 
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reversing the orientation of the coding region with respect to a promoter so as to transcribe the opposite strand 
from that which ,s normally transcribed in the cell. The antisense molecules may be transcribed usina in 
vitro transcnption systems such as those which employ T7 or SP6 polymerase to generate the transcript Another 
approach involves transcription of the antisense nucleic acids in vivo by operably linking DNA containina the antisense 
sequence to a promoter in an expression vector. 

[0461] Alternatively, oligonucleotides which are complementary to the strand normally transcribed in the cell 
may be synthesized in vitro. Thus, the antisense nucleic acids are complementary to the corresponding mRNA and are 
capable of hybndizing to the mRNA to create a duplex. In some embodiments, the antisense sequences may contain 
modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity Examples of 
(1991) t,0nS SUItable f0r USe in antisense strategies are described by Rossi ef a/., Pharmacol. Ther. 50(2):245-254, 

[0462] Various types of antisense oligonucleotides complementary to the sequence of the EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids may be used. In one preferred embodiment, stable and semi-stable antisense oligonucleotides described in 
International Application No. PCT W094/23026 are used. In these molecules, the 3' end or both the 3' and 5' ends are 
engaged in intramolecular hydrogen bonding between complemen-.ary base pairs. These molecules are better able to 
withstand exonuclease attacks and exhibit increased stability compared to conventional antisense oligonucleotides 
[0463] in another preferred embodiment, the antisense oligodeoxynucleotides against herpes simplex virus types 1 
and 2 described in International Application No. WO 95/04141 are used. 

[0464] In yet another preferred embodiment, the covalently cross-linked antisense oligonucleotides described in 
^ernationai Application No. WO 96/31623 are used. These double- or single-stranded oligonucleotides comprise one 
or more, respectively, inter- or intra-oligonucleotide covalent cross-linkages, wherein the linkage consists of an 
amide bond between a primary amine group of one strand and a carboxyl group of the other strand or of the same 
strand, respectively, the primary amine group being directly substituted in the 2' position of the strand nucleotide 

Tn^H^H 1 nn9 '« a .H d "2? Car 4 b0X y 9r0Up being car,ied by an ali P hatic s P acer 9 rou P substituted on a nucleotide 
or nucleotide analog of the other strand or the same strand, respectively. 

l°™fLo ' The , antisense oligodeoxynucleotides and oligonucleotides disclosed in International Application No WO 
r o /,Sn may Tt- The»e molecules are stable to degradation and contain at least one transcription control 

recognition sequence which b.nds to control proteins and are effective as decoys therefor. These molecules may 
andloop^structurJs" S ' dUmbbe "" structures ' "modified dumbbelf structures, "cross-linked" decoy structures 

[0466] In another preferred embodiment, the cyclic double-stranded oligonucleotides described in European Patent 
Application No 0 572 287 A2. These Irgated oligonucleotide "dumbbells" contain the binding site for a transcription 
factor and inhibit expression of the gene under control of the transcription factor by sequestering the factor. 
[0467] Use of the closed antisense oligonucleotides disclosed in International Application No WO 92/19732 is 
also contemplated. Because these molecules have no free ends, they are more resistant to degradation by 
exonudeases than are conventional oligonucleotides. These oligonucleotides may be multifunctional, interacting with 
several regions which are not adjacent to the target mRNA. 

[0468] The appropriate level of antisense nucleic acids required to inhibit gene expression may be determined 
using in vitro expression analysis. The antisense molecule may be introduced into the cells by diffusion injection 
infection or tansfecbon using procedures known in the art. Foi example, the antisense nucleic acid's can be 
introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide 
sequence encapsidated by viral protein, or as an oligonucleotide operably linked to a promoter contained in an 
expression vector. The expression vector may be any of a variety of expression vectors known in the art including 
mayb^DNAorRNA 010 ' 5 ' VeCt ° rS ° aPab ' e ° f extrachromosomal replication, or integrating vectors. The vectors 
[0469] The antisense molecules are introduced onto cell samples at a number of different concentrations 
preferably between 1x1 0" 10 aM to IxlO^aM. Once the minimum concentration that can adequately control gene 
expression is identified, the optimized dose is translated into a dosage suitable for use in vivo. For example an 
inhibiting concentration in culture of 1x1 0 7 a translates into a dose of approximately 0.6 mg/kg bodyweight Level's of 
o gonuc eotide approaching 100 mg/kg bodyweight or higher maybe possible after testing the toxicity of the 
oligonucleotide in laboratory animals. It is additionally contemplated that cells from the vertebrate are removed 
treated with the antisense oligonucleotide, and reintroduced into the vertebrate. 

[0470] It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme 
sequence to enable the antisense to specifically bind and cleave its target mRNA. For technical applications of 
ribozyme and antisense oligonucleotides see Rossi ef al, supra. 

lu*Jll u "L 3 preferred application of this invention, the polypeptide encoded by the gene is first identified so 
that the effectiveness of antisense inhibition on translation can be monitored using techniques that include but' are 
not limited to antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabeling. 
[0472] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 

KivT^L 0 ' r ST ," relat ! d T' eiC 3 f idS may als0 be used in gene theraDV aPP^aches based on intracellular 
triple helix formation. Triple helix oligonucleotides are used to inhibit transcription from a genome They are 
particularly useful for studying alterations in cell activity as it is associated with a particular gene ' The EST- 
related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST- 
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related nucleic acids of the present invention or, more preferably, a portion of those sequences, can be used to 
inhibit gene expression in individuals having diseases associated with expression of a particular gene. Similarly 
the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments 
of EST-related nucleic acids can be used to study the effect of inhibiting transcription of a particular gene within 
a cell. Traditionally, homopurine sequences were considered the most useful for triple helix strategies. However, 
homopyrimidine sequences can also inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major 
groove at homopurine:homopyrimidine sequences. Thus, both types of sequences from the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids 
are contemplated within the scope of this invention. 

EXAMPLE 57 

Preparatio n and use of Triple Helix Probes 

[0473J The sequences of the EST-related nucleic acids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic acids are scanned to identify 10-mer to 20-mer 
homopyrimidine or homopurine stretches which could be used in triple-helix based strategies for inhibiting gene 
expression. Following identification of candidate homopyrimidine or homopurine stretches, their efficiency in 
inhibiting gene expression is assessed by introducing varying amounts of oligonucleotides containing the candidate 
sequences into tissue culture cells which normally express the target gene. The oligonucleotides may be prepared on 

an OliaonUClentiriP RVnthPRi7Pr nr thow maw ho nnrohaeoH *r»mor/Mo1h/ fr«™ ~ _ .:_ 

w -j j „ w KU .w. IW ww^ vviiimuiuiwiiy nwin a wwniyaiiy special luriu III UUMUIM 

oligonucleotide synthesis, such as GENSET, Paris, France. 

[0474] The oligonucleotides may be introduced into the cells using a variety of methods known to those skilled 
in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome-' 
mediated transfection or native uptake. 

[0475] Treated cells are monitored for altered cell function or reduced gene expression using techniques such as 
Northern blotting, RNase protection assays, or PCR based strategies to monitor the transcription revels of the 
target gene in cells which have been treated with the oligonucleotide. The cell functions to be monitored are 
predicted based upon the homologies of the target genes corresponding to the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids from which 
the oligonucleotide were derived with known gene sequences that have been associated with a particular function. The 
cell functions can also be predicted based on the presence of abnormal physiologies within cells derived from 
individuals with a particular inherited disease, particularly when the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids are 
associated with the disease using techniques described herein. 

[0476] The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells may then 
be introduced in vivo using the techniques described above and in Example 56 at a dosage calculated based on the in 
vitro results, as described in Example 56. 

[0477] In some embodiments, the natural (beta) anomers of the oligonucleotide units can be replaced with alpha 
anomers to render the oligonucleotide more resistant to nucleases. Further, an intercalating agent such as ethidium 
bromide, or the like, can be attached to the 3* end of the alpha oligonucleotide to stabilize the triple helix. For 
information on the generation of oligonucleotides suitable for triple helix formation see Griffin et al (Science 245 967- 
971 (1989)). 

EXAMPLE 58 

Use of EST-relgteq' n ucleic acids , p ositional segments of EST-related nucleic acids or fragments of positional 
segments of EST-rela ted nucleic acids to express an Encoded Protein in a Host Organism 

[0478] The EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may also be used to express an encoded protein or polypeptide in a 
host organism to produce a beneficial effect. In addition, nucleic acids encoding the EST-related polypeptides, 
positional segments of EST-related polypeptides or fragments of positional segments of EST-related polypeptides may 
be used to express the encoded protein or polypeptide in a host organism to produce a beneficial effect 
[0479] In such procedures, the encoded protein or polypeptide may be transiently expressed in the host organism 
or stably expressed in the host organism. The encoded protein or polypeptide may have any of the activities 
described above. The encoded protein or polypeptide may be a protein or polypeptide which the host organism lacks or 
alternatively, the encoded protein may augment the existing levels of the protein in the host organism. 
[0480] In some embodiments in which the protein or polypeptide is secreted, nucleic acids encoding the full 
length protein (i.e. the signal peptide and the mature protein), or nucleic acids encoding only the mature protein 
(i.e. the protein generated when the signal peptide is cleaved off) is introduced into the host organism. 
[0481] The nucleic acids encoding the proteins or polypeptides may be introduced into the host organism using a 
variety of techniques known to those of skill in the art. For example, the extended cDNA may be injected into the 
host organism as naked DNA such that the encoded protein is expressed in the host organism thereby producing a 
beneficial effect. 
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[0482] Alternatively, the nucleic acids encoding the protein or polypeptide may be cloned into an expression 
vector downstream of a promoter which is active in the host organism. The expression vector may be any of the 
expression vectors designed for use in gene therapy, including viral or retroviral vectors. The expression vector 
may be directly introduced into the host organism such that the encoded protein is expressed in the host organism to 
produce a beneficial effect. In another approach, the expression vector may be introduced into cells in vitro. Cells 
containing the expression vector are thereafter selected and introduced into the host organism, where they express 
the encoded protein or polypeptide to produce a beneficial effect 

EXAMPLE 59 

Use of Signal Peptides To Import Proteins Into Cells 

[0483] The short core hydrophobic region (h) of signal peptides encoded by the sequences of SEQ ID NOs: 24-652 
and 3721-3811 may also be used as a carrier to import a peptide or a protein of interest, so-called cargo, into 
tissue culture cells (Liner a/., J. Biol. Chem., 270: 14225-14258 (1995); Du et a/., J. Peptide Res., 51: 235-243 (1998); 
Rojas et ai, Nature Biotech., 16: 370-375 (1998)). 

[0484] When cell permeable peptides of limited size {approximately up to 25 amino acids) are to be translocated 
across cell membrane, chemical synthesis may be used in order to add the h region to either the C-terminus or the N- 
terminus to the cargo peptide of interest. Alternatively, when longer peptides or proteins are to be imported into 
cells, nucleic acids can be genetically engineered, using techniques familiar to those skilled in the art, in order 
to link the extended cDNA sequence encoding the h region to the 5' or the 3" end of a DNA sequence coding for a 
cargo polypeptide, Such genetically engineered nucleic acids are then translated either in vitro or in vivo after 
transfection into appropriate cells, using conventional techniques to produce the resulting cell permeable 
polypeptide. Suitable hosts cells are then simply incubated with the cell permeable polypeptide which is then 
translocated across the membrane. 

[0485] This method may be applied to study diverse intracellular functions and cellular processes. For instance, 
it has been used to probe functionally relevant domains of intracellular proteins and to examine protein-protein 
interactions involved in signal transduction pathways (Lin et ai. supra; Lin et ai, J. Biol. Chem., 271 : 5305-5308 (1996); 
Rojas et ai, J. Biol. Chem.. 271: 27456-27461 (1996); Liu era/., Proc. Natl, Acad. ScL USA, 93: 11819-11824 (1996); 
Rojas et ai, Bioch. Biophys. Res. Commun., 234: 675-680 (1997)). 

[0486] Such techniques may be used in cellular therapy to import proteins producing therapeutic effects. For 
instance, cells isolated from a patient may be treated with imported therapeutic proteins and then re-introduced 
into the host organism. 

[0487] Alternatively, the h region of signal peptides of the present invention could be used in combination with 
a nuclear localization signal to deliver nucleic acids into cell nucleus. Such oligonucleotides may be antisense 
oligonucleotides or oligonucleotides designed to form triple helixes, as describedabove, in order to inhibit 
processing and maturation of a target cellular RNA. 

EXAMPLE 60 

Computer Embodiments 

[0488] As used herein the term "nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681*' encompasses the 
nucleotide sequences of SEQ ID NOs: 24-4100 and 8178-36681, fragments of SEQ ID NOs: 24-4100 and 8178-36681, 
nucleotide sequences homologous to SEQ ID NOs: 24-4100 and 8178-36681 or homologous to fragments of SEQ ID 
NOs: 24-4100 and 8178-36681, and sequences complementary to all of the preceding sequences. The fragments 
include portions of SEQ ID NOs: 24-4100 and 8178-36681 comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 
150, 200, 300, 400, or 500 consecutive nucleotides of SEQ ID NOs: 24-4100 and 8178-36681. Preferably, the 
fragments are novel fragments. Homologous sequences and fragments of SEQ ID NOs: 24-4100 and 8178-36681 refer 
to a sequence having at least 99%, 98%, 97%, 96%, -95%, 90%, 85%, 80%, or 75% homology to these sequences. 
Homology may be determined using any of the computer programs and parameters described in Example 18, including 
BLAST2N with the default parameters or with any modified parameters. Homologous sequences also include RNA 
sequences in which uridines replace the thymines in the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681. 
The homologous sequences may be obtained using any of the procedures described herein or may result from the 
correction of a sequencing error as described above. It will be appreciated that the nucleic acid codes of SEQ ID 
NOs: 24-4100 and 8178-36681 can be represented in the traditional single character format (See the inside back cover 
of Starrier, Lubert. Biochemistry, 3 rd a edition. W. H Freeman & Co, New York.) or in any other format which records 
the identity of the nucleotides in a sequence. 

[0489] As used herein the term "polypeptide codes of SEQ ID NOs: 4101-8177" encompasses the polypeptide 
sequence of SEQ ID NOs: 4101-8177 which are encoded by the 5' EST s of SEQ ID NOs: 24-4100 and 8178-36681, 
polypeptide sequences homologous to the polypeptides of SEQ ID NOs: 4101-8177, or fragments of any of the 
preceding sequences. Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 
97%, 96%, 95%, 90%, 85%, 80%, 75% homology to one of the polypeptide sequences of SEQ ID NOs: 4101-8177! 
Homology may be determined using any of the computer programs and parameters described herein, including FASTA 
with the default parameters or with any modified parameters. The homologous sequences may be obtained using any of 
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the procedures described herein or may result from the correction of a sequencing error as described above The 
polypeptide fragments comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids 
of the polypeptides of SEQ ID NOs: 4101-8177. Preferably, the fragments are novel fragments. It will be appreciated 
that the polypeptide codes of the SEQ ID NOs: 4101-8177 can be represented in the traditional single character, 
format or three letter format (See the inside back cover of Starrier, Lubert Biochemistry, 3 d a edition. W. H Freeman & 
Co., New York.) or in any other format which relates the identity of the polypeptides in a sequence. 
[0490] It will be appreciated by those skilled in the art- that the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 and polypeptide codes of SEQ ID NOs: 4101-8177 can be stored, recorded, and manipulated on any 
medium which can be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 
process for storing information on a computer medium. A skilled artisan can readily adopt any of the presently known 
methods for recording information on a computer readable medium to generate manufactures comprising one or more 
of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681, one or more of the polypeptide codes of SEQ ID 
NOs: 4101-8177. Another aspect of the present invention is a computer readable medium having recorded thereon at 
least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681. Another aspect of 
the present invention is a computer readable medium having recorded thereon at least 2 5 10 15 20 25 30 or 50 

polypeptide codes of SEQ ID NOs: 4101-8177. ' 

[0491] Computer readable media include magnetically readable media, optically readable media, electronically 
readable media and magnetic/optical media. For example, the computer readable media rriay be a hard' disc a floppy 
disc, a magnetic tape, CD-ROM, DVD, RAM, or ROM as well as other types of other media known to those skilled in the 
art. 

[0492j Embodiments of the present invention include systems, particularly computer systems which contain the 
sequence information described herein. As used herein, "a computer system" refers to the hardware components, 
software components, and data storage components used to analyze the nucleotide sequences of the nucleic acid 
codes of SEQ ID NOs: 24-4100 and 8178-36681, or the amino acid sequences of the polypeptide codes of SEQ ID 
NOs: 4101-8177. The computer system preferably includes the computer readable media described above, and a 
processor for accessing and manipulating the sequence data. 

[0493] Preferably, the computer is a general purpose system that comprises a central processing unit (CPU), one 
or more data storage components for storing data, and one or more data retrieving devices for retrieving the data 
stored on the data storage components. A skilled artisan can readily appreciate that any one of the currently 
available computer systems are suitable. 

[0494] In one particular embodiment, the computer system includes a processor connected to a bus which is 
connected to a main memory (preferably implemented as RAM) and one or more data storage devices, such as a hard 
drive and/or other computer readable media having data recorded thereon. In some embodiments, the computer system 
further includes one or more data retrieving devices for reading the data stored on the data storage components The 
data retrieving device may represent, for example, a floppy disk drive, a compact disk drive, a magnetic tape drive, 
etc. In some embodiments, the data storage component is a removable computer readable medium such as a floppy 
disk, a compact disk, a magnetic tape, etc. containing control logic and/or data recorded thereon. The computer 
system may advantageously include or be programmed by appropriate software for reading the control logic and/or the 
data from the data storage component once inserted in the data retrieving device. Software for accessing and 
processing the nucleotide sequences of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 or the amino 
acid sequences of the polypeptide codes of SEQ ID NOs: 4101-8177 {such as search tools, compare tools, and 
modeling tools etc.) may reside in main memory during execution. 

[0495] In some embodiments, the computer system may further comprise a sequence comparer for comparing the 
above-described nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 or polypeptide codes of SEQ ID NOs: 
4101-8177 stored on a computer readable medium to reference nucleotide or polypeptide sequences stored on a 
computer readable medium. A "sequence comparer" refers to one or more programs which are implemented on the 
computer system to compare a nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences 
and/or compounds including but not limited to peptides, peptidomimetics, and chemicals stored within the data 
storage means. For example, the sequence comparer may compare the nucleotide sequences of the nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681, or the amino acid sequences of the polypeptide codes of SEQ ID NOs: 
4101-8177 stored on a computer readable medium to reference sequences stored on a computer readable medium to 
identify homologies, motifs implicated in biological function, or structural motifs. The various sequence comparer 
programs identified elsewhere in this patent specification are particularly contemplated for use in this aspect of 
the invention. 

[0496] Accordingly, one aspect of the present invention is a computer system comprising a processor, a data 
storage device having stored thereon a nucleic acid code of SEQ ID NOs: 24-4100 and 8178-36681 or a polypeptide 
code of SEQ ID NOs: 4101-8177, a data storage device having retrievably stored thereon reference nucleotide 
sequences or polypeptide sequences to be compared to the nucleic acid code of SEQ ID NOs: 24-4100 and 8178- 
36681 or polypeptide code of SEQ ID NOs: 4101-8177 and a sequence comparer for conducting the comparison. The 
sequence comparer may indicate a homology level between the sequences compared or identify structural motifs in the 
above described nucleic acid code of SEQ ID NOs: 24-4100 and 8178-36681 and polypeptide codes of SEQ ID NOs: 
4101-8177 or it may identify structural motifs in sequences which are compared to these nucleic acid codes and 
polypeptide codes. In some embodiments, the data storage device may have stored thereon the sequences of at least 
2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 or polypeptide codes 
of SEQ ID NOs: 4101-8177. 
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[0497] Another aspect of the present invention is a method for determining the level of homology between a 
nucleic acid code of SEQ ID NOs: 24-4100 and 8178-36681 and a reference nucleotide sequence comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a computer program 
which determines homology levels and determining homology between the nucleic acid code and the reference 
nucleotide sequence with the computer program. The computer program may be any of a number of computer 
programs for determining homology levels, including those specifically enumerated herein, including BLAST2N with the 
default parameters or with any modified parameters. The method may be implemented using the computer systems 
described above. The method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described 
nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 through use of the computer program and determining 
homology between the nucleic acid codes and reference nucleotide sequences . 

[0498J Alternatively, the computer program may be a computer program which compares the nucleotide sequences 
of the nucleic acid codes of the present invention, to reference nucleotide sequences in order to determine whether 
the nucleic acid code of SEQ ID NOs: 24-4100 and 8178-36681 differs from a reference nucleic acid sequence at one 
or more positions. Optionally such a program records the length and identity of inserted, deleted or substituted 
nucleotides with respect to the sequence of either the reference polynucleotide or the nucleic acid code of SEQ ID 
N ? S ;. 24 ; 1 4100 ? nd 8178 " 36681 - ,n on © embodiment, the computer program may be a program which determines 
whether the nucleotide sequences of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 contains' single 
nucleotide polymorphism (SNP) with respect to a reference nucleotide sequence. This single nucleotide polymorphism 
may comprise a single base substitution, insertion, or deletion. 

[0499] Another aspect of the present invention is a method for determining the level of homology between a 

r-./r-~i -www wi.^ *r.u .-ow t any d i eierenue poiypeptiae sequence, comprising the steps of readinq 

the polypeptide code of SEQ ID NOs: 4101-8177 and the reference polypeptide sequence through use of a computer 
program which determines homology levels and determining homology between the polypeptide code and the reference 
polypeptide sequence using the computer program. 

[0500] Accordingly, another aspect of the present invention is a method for determining whether a nucleic acid 
code of SEQ ID NOs: 24-4100 and 8178-36681 differs at one or more nucleotides from a reference nucleotide 
sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through use of 
a computer program which identifies differences between nucleic acid sequences and identifying differences between 
the nucleic acid code and the reference nucleotide sequence with the computer program. In some embodiments the 
computer program is a program which identifies single nucleotide polymorphisms. The method may be implemented by 
the computer systems described above. The method may also be performed by reading at least 2 5 10 15 20 25 30 
SI" 50 u f i he nUCle ' C add Codes of SEQ ,D NOs: 24 - 4100 and 8178-36681 and the reference nucleotide sequences 
through the use of the computer program and identifying differences between the nucleic acid codes and the reference 
nucleotide sequences with the computer program. 

[0501] In other embodiments the computer based system may further comprise an identifier for identifying 
features within the nucleotide sequences of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 or the 
amino acid sequences of the polypeptide codes of SEQ ID NOs: 4101-8177. 

[0502] An "identifier refers to one or more programs which identifies certain features within the above- 
described nucleotide sequences of the nucleic acid codes of SEQ ID NOs: 24-4100 and 8178-36681 or the amino acid 
sequences of the polypeptide codes of SEQ ID NOs: 4101-8177. In one embodiment, the identifier may comprise a 
program which identifies an open reading frame in the cDNAs codes of SEQ ID NOs; 24-4100 and 8178-36681. 
[0503] In another embodiment, the identifier may comprise a molecular modeling program which determines the 3- 
dimensional structure of the polypeptides codes of SEQ ID NOs: 4101-8177. In some embodiments the molecular 
modeling program identifies target sequences that are most compatible with profiles representing the structural 
environments of the residues in known three-dimensional protein structures. (See, e.g., Eisenberg et al US Patent 
No. 5,436,850 issued July 25, 1995). In another technique, the known three-dimensional structures of' proteins in a 
given family are superimposed to define the structurally conserved regions in that family. This protein modeling 
technique also uses the known three-dimensional structure of a homologous protein to approximate the structure of 
the polypeptide codes of SEQ ID NOs: 4101-8177. (See e.g., Srinivasan, et al., U.S. Patent No. 5 557 535 issued 
September 17, 1996). Conventional homology modeling techniques have been used routinely to build models of 
proteases and antibodies. (Sowdhamini et al., Protein Engineering 10:207, 215 (1997)). Comparative approaches can 
also be used to develop three-dimensional protein models when the protein of interest has poor sequence identity to 
template proteins. In some cases, proteins fold into similar three-dimensional structures despite having very weak 
sequence identities. For example, the three-dimensional structures of a number of helical cytokines fold in similar 
three-dimensional topology in spite of weak sequence homology. 

[0504] The recent development of threading methods now enables the identification of likely folding patterns in 
a number of situations where the structural relatedness between target and template(s) is not detectable at the 
sequence level. Hybrid methods, in which fold recognition is performed using Multiple Sequence Threading (MST) 
structural equivalencies are deduced from the threading output using a distance geometry program DRAGON to 
construct a low resolution model, and a full-atom representation is constructed using a molecular modeling package 
such as QUANTA. a ^ y 

[0505] According to this 3-step approach, candidate templates are first identified by using the novel fold 
recognition algorithm MST, which is capable of performing simultaneous threadina of multiple aligned sequences onto 
one or more 3-D structures. In a second step, the structural equivalencies obtained from the MST output are 
converted into mterresidue distance restraints and fed into the distance geometry program DRAGON together with 
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tow Ml. modd « M , a EZ Z5£Z"J?. « con»,™,,>n 5 . In . thM step, 

SEQ ID NOs' 24-4100 and fii7ft Wai ' . L J' ' ] ' ' ' 25( 30 - or 50 of the nucleic acid codes of 

me S^?d?«, sic KS^«ttS««'w h a <* <«■ *» «•".* 

and BLAST2 (NCBI ™BlS „ ro^(SLSff^^1 m°>T i M , a , C e Lo ?„ k ( Molecu,af Applications Group), BLAST 

Simulations Inc.), CHARMm (M™ecu°ar S^^^^^^ lnC ) ' Discover (Molecular 

Simulations Inc.) QuanteMM (SS^i^Z^llVH^ ^m*?^ S ' mulations "nc), DelPhi, (Molecular 
Simulations Inc.), ISIS (mI^S^S^?) Quan^Pmtein Kr?? ^ MMr (Molecular 
(Molecular Simulations nc) WebLab Diversity fL„,~ m„ Design Molecular S.mulations Inc.), WebLab 
Simulations Inc.), SeqFold (MofecuS Simulations^ ln C ZT ' nC) ' Gene Explorer ( Mo,ecu,ar 

encoding signa peptides which direct thp wrptinn «f thJ -!™7 nences ana beta sheets, signal sequences 



EXAMPLE 61 

Methods of Making N nrleic Anin^ 



S ^^p^^^^^zi^^ i ST - re,a r nuc,eic acids ' fra ^ ents ° f EST " 



included the 3' 



2!. 11, h -!H many ° f theSe methods . synthesis is conducted on a solid support These i, 

described in U.S. Patent No 5 049 656 \*sZt!nln^f Mer ^ y ' ^nucleotides may be prepared as 
are .igated together to generateTger^ « Scribed above 
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Methods of Making Pnly papHH^ 



EX ^n^l!^j!^^^ rT'* ° f makin9 the P ol »«des encoded by EST-related 
fragments of positional ^tndTL^J.^ S^J^J ° f »• EST "^^ nueto acidl o 
polypeptides, fragments of EST-related mLmum! rl > ! aC ' dS and methods of making the EST-related 

■ its- - - *<« * ■5^»j^^j---t 

added posses** blocking groups on its aS SoSJw m^S. ^ «*» «*l *> be 

_ . wl)UUO HU , MUSes , ne polynucleotides ran ho t.c^ f " " ,w " 1-1 CTaLCU nucieic acids can 

charactenzahon or therapeutic use; production of secreted rS^JS?™" rec k ombinant P rotein *>' analysis, 
production, as markers for tissues- in which i the rnrrocl ^to^PM" or chimeric polypeptides, antibody 
consttutively or at a particular stage oTti^difSr^^r ? 8 , Pr0tom ' S P referential| y expressed (Xr 
weight markers on Southern gels; ai ? chroSm !Sf?£SS?f;, ' V"*" States): as moleci " a 
related gene positions; to compare with endogenousDNA Lm f beed) t0 ,denti,y cnr °™somes or to map 

as probes to hybridize and thus discove nX^atod^w? ™^ P^entsto identify potential genetic disorders 
primers for genetic fingerprinting; for selectina fd ^uences, as a source of information to derive PCR 

support, including for examination for extess^n patterns f 9 0 IT'l *" attachment *° a "gene chip" or other 

D ln n,q r S ;, and 93 a " anti 9 en <° ^RLdta^rWh^ USing DN Aimmun^aE 
polynucleotide encodes a protein or DolvnentiH* ihiJ i or ellcit another immune response Where the 

(such as, for example, ./a £epto*ZT^L?£ l^fft ^ l ° anotner ^ « ^peptide 
^ **. for example, that described n Gyuris S ^m^S^,^^ US6d in '^rac^n trap 
encoding the other protein or polypeptide with which mZg o^ o1^!l£SXf 93) \ 1°. ''^ Polynucleotides 
[0515] The proteins or polypeptides provided bv thf nrlf ♦ fy nhlblt ° rS ° f tne blndin 9 interaction, 
determine biological activity, including Tn a panel of multl ^ 00 C3n Sim " arly be used assays to 
antibodies or to elicit another immune response as a eaoSl T^J 0 ' h, 'Sh-throughpat screening; to raise 
S? de,e ™* 'evels of the protein (o? ?J e^^^ 

wh.ch the corresponding protein is preferentially exorS liL r ** < £, fl T te: as markers for tissues in 
tissue differentiation or development or in a f disease E Jf f consMutlve| y or at a particular stage of 
Ugands. Where the protein or polypeptide bLTor llt^l h , ? f C0UfSe ' to isolate correlative receptors or 
example, in a receptor-ligand inJacti'on) th ^ protan ? an 'e HZST^^ ° r polypeptide (^ch s, for 
occurs or to identify inhibitors of the binding nteraction £5 ldentty , the 0ther P rotein which binding 
reactions can also be used to screen for p^wa^^T^J' P olv P e P tid es involved in these binding 
[0516] Any or all of these research utiSes a « caLbt of V T °! ° f the bindin 9 interac «°" 9 

forcommerciafeation as research products P * ° f be ' ng deve,0ped into rea 9 en » grade or kit format 

Sing ^SZ^Z^^V^T^ l° ^ Ski,,6d - art 

Harbor Laboratory Press, Sambrook J E F F*rt t l l nin9 ' A Lab °rat°ry Manual", 2d ed Cold Sorino 

carbon source, use as a nitrogen source and ! / protein or amino ac 'd supplement use as a 

polynucleotide of the invention fan be added toTe feed TTLlS? 1 ^ ,n SUCh Cases *i . pS5J ^ o 
separate solid or liquid preparation, such as in the fo7m 0 f oowd e ? i» ° r9 l a ?' Sm ° f Ca " be ad m'™'stered as a 

r szgszsss^ *** « -Son* re's ffi&sii 

s wu, braXnfrr; c^eX d ^^ i , r h o, certain f r rred ^ 
^^r- Accordi ^ - -p- -~ n^.ttw; ^rrio s 
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NOs: SJEKi 7 " 8681 SeqUenC6S com P ,ementaf y *> sequences of SEQ ID NOs: 24-4100 and SEQ ID 

L ZESS cTsIq a .D d N c ST4 9 io a S S Sq^TSS? ff&SS I'd sequence se,ected from the 9 ™ P 

sequences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-3668^1 qU6nCeS Com ^^ to the 

1 ^^sis^iasis si Q c TD s ^%T7r 3 ^rinr equence seiected from ,he ^ 

sequences of SEQ ID NOs: 24-410C l and SEQ ID NOs: ?178-36681 QUenCeS C ° mp,ementar V to »«» 

^ £ SS^m COmPfiSin9 COdin9 SeqUenCe ° f 3 SeqUenCe selected from the group consisting of SEQ 

6 - ^ sis ^ss^sss^r 01 a sequence seiected from ,he 9roup — ° « seq id 

' nIS^^ SS,eCted from ^ group consisting of SEQ ID 

8 " .tKSS of SEQ rD Ci NOs C 4 0d ?i 9 8 a i7^°^ C ° mpriSin9 - S6qUenCe Se,6Cted from the 9™ p °°«g of the 
9 ' ^^"oflggJS^SS!^ C ° mPriSing 3 SeqU6nCe S6leC,ed ff ° m ,he 9raup insisting of the 

ia *f fl ^^ B ^r^ u ^^ re protein inciuded in a sequence s ^ ected <™ 

VSJiKSS ^"^«5aSS ^i^^Hr" in a sequence sdected from 

sequences complementary to the sequences of SEQ ID NOs: 24-410C ?7nd SEQ !d NOs: ?1 78 366s! ^ 
aaffiM 81 ^ C ° mprisin9 3 Sequence selected fr ™ the group consisting of the sequences of 
14 '77 A 98 P 788 e 8 d " iS °' ated P °' yPePtide COmpn ' Sin9 3 SeqUence selected from the group consisting of SEQ ID NOs: 
1& co1La^SQ^'?S^ 8 ^ p,ttn a 3 ^ture protein of a polypeptide S e,ected from the group 

"■o^X^iS*^ S^iXES 9 of a sequence se,ected from the 9roup — 9 

1 %he g Sptn°s^ amino acids of a sequence seiected from 

18. A method of making a cDNA comprising the steps of: 

^^'ST^iix^s^jr human ceiis wi,h a primer comprisin9 at ieast 15 

NOsf 24-4 1 00 and ^EQ l[T NOs^8^ d 8-3668lf ° r ° UP C ° nS ' S,ln9 * SeqUenC6S ™#™«««* to SEQ ID 
hybridizing said primer to an mRNA in said collection that encodes said protein; 
reverse transcribing said hybridized primer to make a first cDNA strand from said mRNA; 
making a second cDNA strand complementary to said first cDNA strand; and 
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isolating the resulting cDNA encoding said protein comprising said first cDNA strand and said second cDNA 

19. A purified cDNA obtainable by the method of Claim 18. 

20. The cDNA of Claim 19 wherein said cDNA encodes at (east a portion of a human polypeptide. 

21. A method of making a cDNA comprising the steps of: 

SraE Nofa l&SBSr 0 ' *"""""* se,eaed " om ,he s, ° up cmsWns « SEQ ID NOs - 2«™° -a 

identifying a cDNA which hybridizes to said detectable probe; and 
isolating said cDNA which hybridizes to said probe. 



22. A purified cDNA obtainable by the method of Claim 71 . 

23. The cDNA of Claim 22 wherein said cDNA encodes at least a portion of a human polypeptide. 
25 24. A method of making a cDNA comprising the steps of: 

rol/Sof\aTd lle m RNA° f m ° leCUleS fr ° m Ce " S - th 3 firet P*™ c-P-W of hybridizing to the 

30 hybridizing said first primer to said polyA tail; 

reverse transcribing said mRNA to make a first cDNA strand; 

ip a a i in . 9 . a , SeC ° nd , CDNA f and com P |eme ntefy to said first cDNA strand using at least one primer comorisina at 
SiQ^!? 17 n Sa!r-S 3 SeQUenCe Sel6Cted ^ 9r ° UP C ° nSiStin9 ° f SEqTdnoTSSoo 
isolating the resulting cDNA comprising said first cDNA strand and said second cDNA strand. 

25. A purified cDNA obtainable by the method of Claim 24. 

26. The cDNA of Claim 25 wherein said cDNA encodes at least a portion of a human polypeptide. 

27. The method of Claim 24, wherein the second cDNA strand is made by; 

45 contacting said first cDNA strand with a first pair of primers said first n=.,r n t ™ m ~ r 

therein which is included within the sequence of said fiSt pSme^ " *"* '"^ haV, ° 9 * SeqUence 

performing a first polymerase chain reaction with said first pair of primers to generate a first PCR product- 

wherein said fourth and fifth hybridize to sequences with^ Sic ffrsl PCR ^duTand ' ' "* P "™'' 
performing a second polymerase chain reaction, thereby generating a second PCR product. 

28. A purified cDNA obtainable by the method of Claim 27. 
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29. The cDNA of Claim 28 wherein said cDNA encodes at least a portion of a human polypeptide. 

30. The method of Claim 24 wherein the second cDNA strand is made by: 

' SSS'JJJI"! CD 1 A • tfand With a S6C0nd Primer com P risi ng ^ least 15 consecutive nucleotides of a 
sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681; 

hybridizing said second primer to said first strand cDNA; and 

extending said hybridized second primer to generate said second cDNA strand. 

31. A purified cDNA obtainable by the method of Claim 30. 

32. The cDNA of Claim 28, wherein said cDNA encodes at least a portion of a human polypeptide. 

33. A method of making a polypeptide comprising the steps of: 

obtaining a cDNA which encodes a polypeptide encoded by a nucleic acid comprising a sequence selected 
rrom me group consisting of SEQ ID NOs: 24-4100 or a cDNA which encoHo* 

inserting said cDNA in an expression vector such that said cDNA is operably linked to a promoter; 

ScDrS? OS eXPr6SSi0n V6Ct0r 3 h ° St Ce " Whefeby S3id h0St Ce " produces the P ro,ein encoded fa V 

isolating said protein. 

34. An isolated protein obtainable by the method of Claim 33. 

35. A method of obtaining a promoter DNA comprising the steps of: 

obtaining genomic DNA located upstream of a nucleic acid comprising a sequence selected from the qrouo 

screening said genomic DNA to identify a promoter capable of directing transcription initiation; and 
isolating said DNA comprising said identified promoter. 

36 'c B Iu!n meth °, d f f ^ la ' m I 5 ' Wherein said obtainin 9 ste P emprises walking from genomic DNA comprisina a 
sequence selected from the group consisting of SEQ ID NOs 24-4100 and SEQ ID NOv 8178^81 md th! 
sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 8V8-36681 and the 

**' JOHS?! 0 ? ° f J?' a ' m 36 j wherein said screening step comprises inserting genomic DNA located upstream of a 
sequence selected from the group consisting of SEQ ID NOs 24-4100 and SEQ ID NOs «17R ^rrZ L 
sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID Not- SrSSSa^JS .^SSSrSSto* 

SSK^SSf N ° S: 24 - 41 °° and SEQ ,D N ° S: B^S-36681 Q wn?c?S S 
39. An isolated promoter obtainable by the method of any one of Claims 34 to 38. 

SEQ ID NOs- 8178 36681 anH franminf sequenc . es complementary to the sequences of SEQ ID NOs: 24-4100 and 
tU IU NOs 8178 - 366 81 and fragments compnsing at least 15 consecutive nucleotides of said sequence. 

41. The array of Claim 40 including therein at least two sequences selected from the group consisting of SEQ ID NOs: 
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24-4100 and SEQ ID NOs: 8178-36681, the sequences complementary to the sequences of SEQ ID NOs- 24-4100 
and SEQ ID NOs: 8178-36681. and fragments comprising at least 15 consecutive nucleotides of said sequences. 

42. The array of Claim 40 including therein at least five sequences selected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681, the sequences complementary to the sequences of SEQ ID NOs- 24- 
4100 and SEQ ID NOs: 8178-36681 and fragments comprising at least 15 consecutive nucleotides of said 
sequences. 

43. An enriched population of recombinant nucleic acids, said recombinant nucleic acids comprising an insert 
nucleic acid and a backbone nucleic acid, wherein at least 5% of said insert nucleic acids in said population 
comprise a sequence selected from the group consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs 8178-36681 
and the sequences complementary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

44 A purified or isolated antibody capable of specifically binding to a polypeptide comprising a sequence selected 
from the group consisting of SEQ ID NOs: 4101-8177. 

45. A purified or isolated antibody capable of specifically binding to a polypeptide comprising at least 10 
consecutive amino acids of a sequence selected from the group consisting of SEQ ID NOs: 4101-8177. 

46. An antibody composition capable of selectively binding to an epitope-containing fragment of a polypeptide 
comprising a contiguous span of at least 8 amino acids of anv of SEQ ID NOs- 4ini-fti77 wh^in c =>iH afl «ho^.ic 
polyclonal or monoclonal. ' ' 

47. A computer readable medium having stored thereon a sequence selected from the group consisting of a nucleic 
acid code of SEQ ID NOs: 24-4100 and 8178-36681 and a polypeptide code of SEQ ID NOs: 4101-8177. 

48. A computer system comprising a processor and a data storage device wherein said data storage device has 
stored thereon a sequence selected from the group consisting of a nucleic acid code of SEQID NOs- 24-4100 and 
8178-36681 and a polypeptide code of SEQ ID NOs: 4101-8177. 

49. The computer system of Claim 48 further comprising a sequence comparer and a data storage device havinq 
reference sequences stored thereon. 

50. The computer system of Claim 49 wherein said sequence comparer comprises a computer program which 
indicates polymorphisms. 

51. The computer system of Claim 48 further comprising an identif er which identifies features in said sequence. 

52. A method for comparing a first sequence to a reference sequence wherein said first sequence is selected from 
the group consisting of a nucleic acid code of SEQID NOs: 24-4100 and 8178-36681 and a polypeptide code of 
SEQ ID NOs: 4101-8177 comprising the steps of: 

reading said first sequence and said reference sequence through use of a computer program which compares 
sequences; and 

determining differences between said first sequence and said reference sequence with said computer program. 

53. The method of Claim 52, wherein said step of determining differences between the first sequence and the 
reference sequence comprises identifying polymorphisms. 

M * * ^ e £iJ d f0r identifvin 9 a feature in a sequence selected from the group consisting of a nucleic acid code of 
SEQID NOs: 24-4100 and 8178-36681 and a polypeptide code of SEQ ID NOs: 4101-8177 comprising the steps of: 

reading said sequence through the use of a computer program which identifies features in sequences; and 

identifying features in said sequence with said computer program. 

55. A vector comprising a nucleic acid according to any one of Claims 1 to 12. 

56. A host cell containing a nucleic acid of Claim 55. 

57. A method of making a nucleic acid of Claims 1 comprising the steps of: 

introducing said nucleic acid into a host cell such that said nucleic acid is present in multiple copies in 
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each host cell; and 

isolating said nucleic acid from said host cell. 

58. A method of making a nucleic acid of any one of Claims 1 to 12 comprising the step of sequentially linking 
together the nucleotides in said nucleic acids. 

59. A method of making a polypeptide of any one of Claims 13 to 17 wherein said polypeptides is 150 amino acids in 
length or less comprising the step of sequentially linking together the amino acids in said polypeptides. 

60. A method of making a polypeptide of any one of Claims 13 to 17 wherein said polypeptides is 120 amino acids in 
length or less comprising the step of sequentially linking together the amino acids in said polypeptides. 
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* Description of Transcription Factor Binding Sites present on promoters isolated from 
SignalTag sequences 



Promoter sequence P13H2 (546 bp): 
Matrix 



CMYB_0l 
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